


THE JOURNAL OF 
EDUCATIONAL PSYCHOLOGY 


Volume XVIII March, 1927 Number 3 














ON THE FORMATION OF STRUCTURE DIAGRAMS 
BETWEEN FOUR CORRELATED VARIABLES 


GODFREY H. THOMSON 
Moray House, University of Edinburgh 


I. INTRODUCTION 


In this Journal for May, 1926, on page 306, I published some 
structure diagrams as an aid to thinking out the interpretation of 
Burt’s regression equation. I have had many enquiries concerning 
the method of making these, other than by trial and error, and the 
object of the present paper is to set out some ways of doing this. 

If the score z in an intelligence test is due to a number of small 
causes m, n, p, etc., so that 


z= Am-+ Bn+Cp + etc., 


where some of the coefficients A, B, C may, of course, be zero and if 
the score y in another test be given by 


y = A'm + B’n + C'p + ete. 
then the correlation between z and y is given by 
_ AA'om? + BB's,” + CC's,” + ete. 
Ty 


Tsy = 
If the small causes are equal to one another this becomes! 
AA’ + BB' + CC' + etc. 


~ WV(A2+ BY + C2 + etc.) V(A” + B® + C” + ete.) 


1 See page 99, British Journal Psychology, Nov., 1919, Vol. X in an appendix to 
a paper by Mr. J. R. Thompson to whom I supplied these formule before May 
1919. They have also been proved independently by J. C. M. Garnett, Proceed- 
ings of the Royal Society London, Series A, Vol. XCVI, September, 1919 (see 
between equations 15 and 16). I learn privately that they have recently again 
been proved independently by Professor J. Clark Hull of Wisconsin. I obtained 
them by transcription into modern notation from Bravais, Memoires de lI’ Institut 
de France, 1846; as did also Kapteyn, Monthly Notices of the Astronomical Society 
1912, LX XII, 518 whose paper I came across later. 

145 











Vey 





















2 apes 


i. - 
as 
es me © renee 2 a 


SS 2 ee SER 








146 The Journal of Educational Psychology 


If in addition the small causes are “all or none”’ in their action, as are 
neurones, then all the coefficients A, B, C, A’, B’, C’, equal either 
unity or zero and the formula becomes! 

number of common causes. 
~ geometrical mean of the two totals 


The present paper is solely concerned with diagrams assuming 
equal, all-or-none elements,. possibly neurones. Interference factors 
are not here considered. J. R. Thompson gave a general method of 
making such diagrams for three variables in 1919 (loc. cit.). The 
problem for four variables is considerably more complicated. I 
use the following diagrams and rules as my tools. It will be recalled 
that there are six correlations between four variables. 





Tey 


II. Rutes Wuicw LEAVE THE COMPLEXITY OF THE VARIABLES 
UNCHANGED 


1. To increase one correlation, say 112, without changing the others. 

Method (A).—Increase the elements common to 2; and z2 and to 
no others, and by the same amount decrease each of the specifics of 
x; and of 2-2. 





Fig. 1. 





1T had already arrived at this independently before obtaining the general 
formula (see British Journal of Psychology, 1916, VIII, p. 275). On page 282 
Professor Spearman claimed it as a special case of his formula for sums or differ- 
ences, as indeed it is; but that is not how I obtained it. 





ing 


oF 
mi 


inc 
otk 
plu 


are 


thr 
fou 


bin 








\e ee SS «| —hlCC 


—_— Pr 


Lend) 


al 
32 
T- 


Formation of Structure Diagrams 147 


Method (B).—Keeping entirely within the space common to 2s 
and 24, mark the white! areas minus and the shaded areas plus, (mean- 
ing that the same numbers of elements must be added or subtracted). 


x; Xi imereased 





Fie. 2. 


Method (C).—Keeping entirely inside x; and outside z, (or inside 
xz, and outside x;) mark white the areas plus and the shaded areas 
minus. 

2. To alter a diagram without changing any of the correlations. 

This can be done by various combinations of the above, such as 
increasing 712 by A and decreasing it again by B. The following among 
others have proved useful. 

Method (D).—Leaving x; untouched, mark all other white spaces 
plus, and all other shaded spaces minus, or vice versa. D = C — A. 
D2, D3, and D, are similar diagrams in which 22, x3, and 2, respectively 
are left alone. 

Method (E) most simply described as A — B. 





1The white areas are those common to an odd number of variables (one or 
three). The shaded areas are common to an even number of variables (two or 
four). The cells of the figure are 1 “general” cell in the middle, 4 ‘‘triple”’ cells, 
6 double or “‘mutual”’ cells, 4 “‘specific’”’ cells, and the outer space, giving the 
binomial distribution 1, 4, 6, 4, 1. 


% 
i 











148 The Journal of Educational Psychology 





Cia.s 


Fig. 3.—Ciz.s means a C diagram (there are twelve C diagrams in all) altering ri: 
by means of changes within z;. Mutatis mutandis any one of the other five correla- 
tions can be increased, and of course changing all signs gives a decrease. 








riz 
|a- 


Formation of Structure Diagrams 149 

There are six varieties of this diagram. 12 has changes in the 

specifics of v7, and 22; Ex, has changes in the specifics of x, and 2, 
and so on. 


No ch ange 





Fie. 5. 


Method (F) is formed from C — B and hardly needs a diagram. 
Remaining inside one variable, mark all whites plus, all shaded areas 
minus, or vice versa. There are four varieties F,, Fs, etc. 


III. AppiicaTion TO Four EquaLLy COMPLEX VARIABLES 


Let us employ the above rules in order to make structure diagrams 
illustrating the set of correlation coefficients: 


72 T13 .o9 T14 
ol Tog = .67 T34 


T12 
T23 


46 
0d 


In the present section we will confine ourselves to diagrams in 
which each variable is composed of 100 factors, as being mathe- 
matically the simplest. First place 39 factors in the general cells, the 
smallest r being .39. Then place factors into each ‘“‘mutual”’ area to 
make up this 39 to the number equal to 100r for that pair of variables. 
Thirdly place enough specific factors into each variable to make the 
total 100. This however may as in the present instance result in a 


— 


= ae eee om ss 


wre. cer lees 


, >, bee s Pk > — ey " sie ~ 2 ~ 
\ ' a i A  . ei. ~ 


ae 











150 The Journal of Educational Psychology 


fac 
/, 


“ait 


to 
vic 





Fra. 6. 


negative number which we do not want.! Search therefore among 
those standard diagrams which do not change the correlations, for 
one which can be used to increase the specifics of x2. D, for example, 
can be so used here because there are here sufficient factors in the cells 
which D, depletes. Add 12D and we obtain the following: 


I 
i 





Fia. 7. 





1 A negative number in a cell is only used here as a transition stage to a correct 
diagram: for a number of elements which are subtracted from a variable has 
nevertheless to be included in the total number in that variable, in our formula for 
r, and the diagrams here containing negative cells would not give the correct r’s. 

These negative elements are not interference elements, which have to be 
positive in one variable and negative in another. 





Formation of Structure Diagrams 151 


Each variable will now be found to have 100 factors, and the common 
factors are such as to give the required six correlations. 

Any of the four patterns D or the six patterns E can now be used 
to alter the structure further without changing the correlations, pro- 
vided that those cells which the pattern depletes have any contents. 
Let us for example reduce the general factor by adding 162; giving: 





Fia. 8. 


If we now add 22E;, we reduce the general factor to a very small 
influence without changing any correlations: 





Fie. 9. 





ae oe . 
¥ -" y = { 
a 
eae ~ — ~* 


ares 
~~ 








we gee oe | eee oe Se 








152 The Journal of Educational Psychology 


The application of method F to the variable x; permits in this instance 
of the total extinction of the general factor thus: 





Fig. 10. 


The illustrations of this section have been restricted to the same 
number of elements in each variable. The next section removes this 
limitation. 


IV. A GENERAL FORMULA 
The following diagram will always give the correlations ris, ris, 


~ , = 
C,m, 


x, C.m, 


















: 
Cym, -ma,- mm, 






~Dymymy + nt Ny 


tNy +4 


x» 
C,m,-m,m,-m,™3 
~D.mm, +n, +N, 


t Ny +q 





me 
my - Dm my 






z 
Gn,- mr, 
~™,M5-Dim,m, 
+n+n + Mtg 






Fia. 11. 











if 








e 


Formation of Structure Diagrams 153 


Y23, 114) T24, T34, for any values of m1, m2, m3, M4, 11, Ne, Ns, M4, and q, 
provided none of the cells are negative. 








° T93 sii 
Herein Ci = Dy = ruVC 
T1201 13 
T13 =~ 
C; = - Dz = raW Cs 
T2723 
T12 “~ 
C; = D; = rsa C3 
13723 


It reduces to J. Ridly Thompson’s diagram for three variables, 
if the outline of x, is deleted and n, put equal to zero. 

The total numbers of “all or none” elements in the four variables 
are Cym,?, Com2?, C3m;? and M,? respectively. 

I find it myself most expeditious not to attempt to give values to 
the n’s and to q but, after giving values to the four-m’s to operate 
thereafter in terms of the rules of section II of the present paper so as 
to obliterate any negative cells. 

As regards the m’s, the following rule should be noted: 

Rule G.—Increasing an m decreases (algebraically) the three 
other specific cells, and increases the three simple linkage or mutual 
cells. It also usually increases (algebraically) its own specific cell 
unless that cell is already very negative (more negative than 
—Cm?*) in which case slight increases in m may make it worse, though 
large increases in m will again increase it algebraically. 





Fig. 12. 


4 
is 
’ 











154 The Journal of Educational Psychology 


V. FurtHerR ILLUSTRATIONS 


As an illustration, let us take first a simple case where the correla- 
tions are low: rie = .15, ris = .10, rez = .18, 714 = .22, To = .20 
734 = 25. Here 


’ 


C; = 12.00 D, = 0.76. 
Ce = 3.71 De = 0.39. 
C3; = 8.33 D; = 0.72. 


With these values the general diagram will give the proper values of 
the r’s for any values of the m’s, n’s, and qg provided no cells thereby 
become negative. Leaving the n’s and g zero, put m; = mz = m; = 10, 
ms, = 20 and we obtain: 





Fia. 13. 


This gives the required correlations, and by methods D, E and F 
the structure can now be altered in many directions without changing 
those correlations. A study of this diagram will show also the opera- 
tion of method G. Had we taken m, = 10 instead of 20, the total 
number of factors in x, would have been only 100, that is, one-quarter 
of the present number. The cells containing 144, 152 and 78 would 
have been halved, and the specifics 489, 848, and 93 correspondingly 
increased. The specific cell of 2; however would be much decreased, 
for its other occupied cells are only halved while its total is quartered. 


Indeed 26 sinks to a negative number, which is why m, = 20 was 
taken, and not m, = 10. 














= == ST om ¢ WS 


~ 


Formation of Structure Diagrams 155 


In a case where the correlations are higher, more manipulation is 
needed before the undesirable negative cells are obliterated. Take 
the case already used in section III of the present paper where solu- 
tions were arrived at in which each variable had the same number of 
factors. We can now make solutions free from that limitation. We 
have: 


Ti = 72 T13 >= .39 Tuy = .46 
T23 = 51 To. = .67 T3734 = 05. 
whence 
Ci = 1.81 D, = 0.62 
C. = 1.06 Dz = 0.69 
C; = 3.62 D; = 1.05 


If we begin by taking m, = mz = m; = m = 10 and all n’s and q = 
zero we obtain the diagram 





Fia. 14. 


This contains three strongly negative cells, and if we endeavor 
to obliterate these by methods D, E, and F we find the greatest difficulty 
in doing so completely. By reducing m;, however, we can reduce the 
specific cell of z; while increasing the specific cells of the other vari- 
ables. Try therefore m, = m2 = ms = 10,m; = 9. 


Pee < 


ee aa Ne ee 


y: 


_ 


See 


-—— 


i =, ae ~ 6 aie 7 oauer 
te - “ 


PIF ANE 


— 


es 


eae, 2 eee 
in ee oS 








156 The Journal of Educational Psychology 


We obtain 





Fig. 15. 


To this we can add and subtract various doles of the diagrams D, E 
and F. We again find difficulty in completely obliterating the 
negative cells, although adding 7D, 83D, and subtracting 62E3, 
almost does so, giving 





Fia. 16. 


Indeed with this diagram, if we simply delete the —1, we have a 
structure which gives very closely indeed the required correlations. 
However, by further changes in the m’s, guided by method G, we can 





Hine A He 








We NR woe 





a 


Formation of Structure Diagrams 157 : 

Ee 

reach diagrams without such forcing. For example m, = 10, mz = : iF 
15, ms = 9, my = 11 gives : 4 
a 

i 

; “fl 

ie 





Fie. 17. 


= sponte. 


To this add 68D;, 46D,;, 59D, in succession, and subtract 51E3, 
reaching 





Fie. 18. 


which also gives the required correlations. It can be obtained directly 
from the general diagram by putting m: = 10, mz = 15, m; = 9, 
m =11, n, = 71%, ne = 25%, nz; = 42%, ng = 334, ¢ = 25. 





158 The Journal of Educational Psychology 


VI. SumMarRyY 


A general formula has been given for the construction, with all- 
or-none elements, of structure diagrams representing four correlated 
variables, and rules for the manipulation of such structures with- 
out changing the correlations, thus enabling numbers of equivalent 
structures to be rapidly made for the purpose of aiding the interpre- 
tation of correlations between four variables. 





-; 


1. tS Ee eo Ue 





THE EFFECT OF TYPE OF TRAINING UPON 
TRANSFERENCE 


HERBERT WOODROW 


University of Minnesota 


When an experimental study has shown that practice in one opera- . 


tion, ¢.g., memorizing poetry, fails to produce improvement in another 
e.g., memorizing prose,’ does this mean that the same conclusion is 
valid no matter what type of practice is used? Does it hold for prac- 
tice accompanied by explanation of methods and illustrations of how 
these methods should be applied in the performance of other tasks 
than the one in which the individual is drilled? It is worth while to 
know that an experimenter allowed his subjects practice of a sort 
which did not result in a large degree of transference; but is it not of 
at least equal value to know how training can be given so as to transfer, 
that is, so as to produce improvement in a number of related functions? 

That training of the latter sort is possible has been indicated by 
various experimental findings. The experiments particularly referred 
to are not the great mass of transference experiments. These latter, 
however, it should be noted in passing, have shown that improve- 
ment resulting from almost any sort of practice yields, as a rule, some 
transference. The lack of transference has in the past sometimes been 


exaggerated. Pillsbury pointedly remarks that ‘““A few men hold \ 
and more have held that the result indicates that there is no transfer — 


of training whatsoever.? Stratton sums the matter up rather 
definitely when he writes that, ‘‘The experiments in clear support of 
this doctrine—that you train merely what you train—are few; most 
experiments contradict it.’’® 


The experiments which are particularly important, however, are 


those which show that, while practice of a narrow, routine sort may | 


produce very little improvement in activities other than the one 
practiced, practice differently directed may be of marked benefit. 
Only a few suet éxperiments have ever been made, and even these few 
have not always been reported in an adequate statistical manner. 
Probably the most important of these experiments, though one for 








1 For such an experimental study, see Sleight, W. G.: Memory and Formal 
Training. British Journal of Psychology, 1911, Vol. IV, pp. 417 and 431. 
~ 2 Pillsbury, W. B.: “Education as the Psychologist Sees It.” 1925, p. 300. 
* Stratton, G. M.: “ Developing Mental Power.” 1923, p. 12. 
159 


—— 


, 


— 
iy 

it ' ’ 
q 

bel . 














160 The Journal of Educational Psychology 


which detailed data were never published, is one described by Judd.' 
In this experiment, two groups of boys were given practice in shooting 
at a target beneath water and, in the case of one of the groups, this 
practice was accompanied by a full theoretical explanation of refrac- 
tion. When the depth of the target was changed, the group which had 
been taught the theory showed a striking superiority over the other 
group. Mention should also be made of the experiments of Squire 
and of Ruediger. Squire, according to the report of her experiment 
- by Bagley,? found that the habit of producing neat arithmetic papers 
failed to produce the slightest improvement in the neatness of language 
and spelling papers. Such findings constitute a remarkable illus- 
tration of the utter inadequacy of certain methods of training. Rue- 
diger,* however, conducted an investigation through teachers which 
showed that, while teaching the habit of neatness in one school sub- 
ject, one could, by inculcating the deal of neatness, obtain an improve- 
ment in the paper in other school subjects, even though the topic of 
neatness was scrupulously avoided in connection with these other 
subjects. 

May not the general problem which is raised by these few citations 
be stated as the problem of the difference, with respect to the resulting 
transference, between unenlightened drill and intelligént teaching? 
It has been so conceived in the experiment to be described. 

This experiments deals with the possibility of teaching a general 
'technique of memorizing. Its object is to show that training in certain 
kinds of memorizing may be given in two such widely different ways 

that in the one case, the individual will benefit little or not at all 
and, in the other case, enormously, when he turns to new kinds of 
| memorizing. The technique of memorizing was chosen largely because 
it is among those techniques concerning which we possess fairly ade- 
quate knowledge. It would, perhaps, have been more interesting to 
attempt to teach the proper technique of reasoning; but it was believed 
too hazardous to attempt to teach what is not yet known. The experi- 
ment which was carried out on memorizing has, however, its analogy 
in any field of teaching where training is given in the hope that it will 
- result in a better ability to undertake activities related to, though 





1 Judd, C. H.: The Relation of Special Training to General Intelligence. 
Educational Review, Vol. XXXVI, 1908, pp. 36-37. 

? Bagley, W. C.: “The Educative Process.” 1916, p. 208. 

* Ruediger, W. C.: The Indirect Improvement of Mental Functions through 
Ideals. Educational Review, Vol. XXXVI, 1908, pp. 364-371. 





di: 
tri 


1n! 


til 








“~~ VS He 


—_— Ss j&€$ woe WY §€ WP ee VEE lULh 


_ 


Pall 
. 


Effect of Type of Training upon Transference 161 


different from, the particular performances in which the individual is 
trained. 

Three groups of subjects were required in the experiment. These 
will be designated the control, practice and training groups. The 


control group, consisting of one hundred and six university sopho-“ 


mores, was given no practice or training in memorizing, but was tested 
in six different forms of memorizing at the beginning and end of a 
period of four weeks and five days. The two other groups, the prac- 
tice and the training groups, numbering thirty-four and forty-two, 
respectively, were given at the beginning and end of this period the 
same tests as the control group. The analysis of the procedure 
employed with the practice and training groups will be postponed 
until these ‘‘end-tests,” given to all three groups, have been described, 
and data presented on their degree of relationship as shown by their 
intercorrelations. 

The final and initial forms of each end-test were similar in form, 
but the actual material to be memorized was entirely different. All 
the material, except that used in testing memory-span, was presented 
to the subjects for study on mimeographed sheets. The tests were 
given to the subjects in groups, each group consisting of a section of 
thirty to forty-five sophomore students in experimental psychology. 
The gix end-tests were as follows: 

1. Rote Poetry—The time required for learning verbatim certain 
verses was the measure used. The initial test consisted of 
four verses of ‘“ Mistress Gilpin” comprising twenty-eight syllables 
each. The final test consisted of five verses of ‘‘ Alice Brand,” of 
twenty-eight syllables each. This second poem was harder to memo- 
rize than the first, so that for all groups it required a longer time. 

2. Rote Prose-—The initial test consisted of a passage of ninety- 
eight words from Benjamin Franklin’s ‘‘ Autobiography.” The 
time required for verbatim learning was recorded. 

3. Facts.—A test of memory for the substance of prose sentences. 
The initial and final tests each contained two lists of twenty miscel- 
laneous items of information, gleaned from a dictionary of facts. 
The subjects were allowed six minutes for study of the facts, then 
after a thirty seconds’ pause, were given fifteen minutes, a very liberal 
period, to write out all that they could remember. For purposes of 
scoring, the statement of each fact was divided into from four to eight 
parts, in such a way that each part, in the experimenter’s opinion, was 
about equally important. The subjects’ papers were scored by deter- 


" 4 
i. 
{i 








162 The Journal of Educational Psychology 


mining the number of these parts which, in substance, were correctly 
reported. Every effort was made to keep the subjective element in 
the scoring as uniform as possible. The maximum possible score in 
the initial test was ninety-nine, and in the final test, ninety. 

4. Turkish-English Vocabulary.—Each end test consisted of a 
list of thirty Turkish words with their English equivalents.! The 
subjects were allowed six minutes for study of the vocabulary. 
Then, after one minute of intermission the Turkish words, printed in 
large type, were shown at the rate of one every six seconds, and the 
subjects endeavored to write the corresponding English words. 

5. Historical Dates.—Lists of twenty, little-known historical 
events, together with their dates, were given the subjects to study for 
a period of six minutes. One minute after the end of this period, the 
names of the events, printed on large cards, were shown at the 
rate of one every six seconds, and the subjects attempted to write the 
corresponding dates. Half credit was given when the last two figures 
of the date were correct but the century wrong. No credit was 
given when the century alone was right. The score was the total 
number of correct answers, including the half-credits. 

6. Memory span, for consonants, presented orally at the rate of one 
per second. Both initial and final tests included two series of lists, 
each series extending from a list of four to a list of ten, inclusive. The 
series were treated as series of minimal changes given to determine the 
point at which the chances of getting the series right or wrong are 
equal. The scores made on the two series were averaged. 
| The above six tests, although they are all called tests of memory, 
/ do not show a particularly high degree of interrelationship. Some of 
them show a very low correlation with the others. This would indi- 
cate that any technique which benefits peempane in all of these 
tests must possess a wide range of applicability.~ The degree of inter- 
relationship between the tests is shown in Table I, which gives the 
coefficients of correlation between the six tests, together with their 
probable errors, as obtained with the one hundred and six subjects 
constituting the control group. These coefficients have been corrected 
for attenuation, so as to avoid the possibility of conveying an exag- 
gerated impression of the degree of alienation between the tests, which 


is undoubtedly large. The calculations were made by the formula, 
‘ | 

roo o = —== =>» in which r is the average of the four Pearsonia 
V Tisle4 “e oa 





1 This material was compiled by Dr. W. S. Foster for use in connection with his 
laboratory manual, ‘‘ Experiments in Psychology,’’ 1923. 





It 
al 
ce 


<—| rl ms i! 


TY} 








= 


Oo 





Effect of Type of Training upon Transference 163 


coefficients of correlation obtainable between the two scores (initial 
and final) of any two end-tests, and r;; and ra, are the coefficients 
of reliability of those two end-tests. The coefficient of reliability 
of any end-test is the coefficient of correlation between its initial 
and final forms. 

The obtained coefficients of reliability were as follows: 


CoEFFICIENTS OF RELIABILITY 


Test r PE 
I 000 i ciuheEtis Edie wana ¢+:+ « exter itin +.67 +.05 
Sa +.49 +.07 
NS iiss clad aks ens ocean due geen eek +.48 +.07 
I, ote os co es ee +.60 +.06 
Turkish-English Vocabulary...................... +.70 +.05 
ELS a Pe oe eee +.55 +.07 


(Auditory, for Consonants) 


These coefficients of reliability are not as high as one might expect. 
It should be remembered, however, that the initial and final forms of 
any test comprised entirely different material, and that the subjects 
constituted a rather homogeneous, highly selected group. 


TABLE I.—INTERCORRELATIONS OF THE END-TESTS, CORRECTED FOR ATTENUATION 
Each Coefficient has Its Probable Error! Given Immediately below It 


















































Tests Poetry | Prose | Facts | Dates vous Span 
ulary 
ie) a Se ae yee +.91 | +.46 | +.28 | +.34/ +.15 
+.05 | +.08 | +.08 |} +.08 | +.09 
NS er Toles caus + .91 + .45 | +.34/) +.31 |) +.14 
+ .05 +.09 | +.09| +.08 |} +.09 
ED IC rR 2 + .46 | +.45 +.55 | +.52 | +.39 
+.08 | +.09 +:08 | +.07 | +.09 
Ce a7 NET +.28 | +.34) +.55 +.73 | +.12 
+.08 | +.09 | +.08 _+.05 | +.09 
. Siipiaeica mere» + .34 {| +.31 |) +.52 | +.73 + .06 
+.08 | +.08 | +.07 | +.05 + .08 
MR ei geskeecseeddee x60 0h +.15 | +.14 | +.39 | +.12 | +.06 
+.09 | +.09| +.09 |} +.09 | +.08 
1 Probable errors are calculated by the formula: (PE (roo) = 67455 
1 l +113 + a4 1 a. 4 4 bat 
(472 ° +4 + 2 + A, + i et 2) . See Kelley, 


T. L.: “Statistical Method.” 1923, p. 210. 


i Sy 


a alt Be 
oe eee. tee eee 


~« eS. 2 oe See wee’ 2 





164 The Journal of Educational Psychology 


So far as the present experiment is concerned, Table I is of interest, 
primarily, as indicating the degree of heterogeneity of the end-tests. 
It is true that several of the coefficients of correlation in the table are 
rather high as, for example, that for prose with poetry (both of which 
tests require verbatim memorizing) and also that for dates with vocab- 
ulary (both of which demand the formation of paired associates). 
The average of all the coefficients in the table, however, is only +.38 
sufficiently low, considering that the coefficients have been corrected 
. for attenuation, to indicate that the end-tests taken as a whole bring 
into play widely dissimilar activities. 

The primary object of the present investigation is not to study the 
results obtained with the end-tests given the control group, but to 
compare the “practice” group with the “training” group. It is 
important therefore, to record exactly the procedure employed with 
these two groups. They were both given practice, between the 
initial and final end-tests, in two forms of memorizing, namely, learning 
poetry verbatim} and learning nonsense syllables in columns of pairs. 
To the practice group this drill was given, as has been customary in 
experiments on transference, in a routine fashion without any explana- 
tion of principles, discussion of methods, or comparison of the methods 
to be used in different kinds of problems. To the training group 
some practice was given with the same materials utilized by the prac- 
tice group but, in addition, instruction was given in .the tech- 
nique of memorizing. The total time consumed by each group, apart 
from the end-tests, was 177 minutes divided into periods occurring 
twice a week for four weeks. The periods varied from 19 to 28 
minutes and averaged 22 minutes. Of the total time, the practice 
group spent 90 minutes in memorizing poetry and 87 minutes in memo- 
rizing nonsense syllables. ‘The members of the training group divided 
their time as follows: for a total of 76 minutes, they listened to an 
exposition of the technique of memorizing, including rules and illus- 
trations of how these rules should be applied: for 76 minutes, they 
memorized poetry; and for 25 minutes, they studied nonsense syl- 
lables. In the practice periods, both groups were motivated 
by the assurance of the experimenter that such work would improve 
their memory. In the case of the training group, however, all the 
practice in memorizing was done with the purpose of attempting, as 
far as possible, to apply the rules that had been learned. Nothing 
was said to the practice group as to how they should memorize, except 
that they were told to memorize by heart, and when they finished one 








a -_ ss i Be Sr pe tpetrT6hlUlCUCree,ltéCO 








al ww wu =F . 


Dll “=F ws we 





Effect of Type of Training upon Transference 165 


selection to begin another. With the training group, on the other 
hand, an attempt was made to give a thorough understanding of the 
technique of memorizing. No attempt was made to obtain a record 
of the improvement of the groups during the training periods. In 
view of the object of the experiment, such a record seemed unneces- 
sary, and the testing required to secure it would have added a degree 
of similarity to the training of the two groups which it was preferred 
to avoid. Such testing, moreover, would have considerably reduced 
the time, already too short, that was available for practice. To both 
groups, the final end-tests were given five days after the last practice 
period. 

The detailed program for the eight practice periods of each group 
is shown in Table II. 

The rules which were taught to the members of the training group, 
and which they applied to the memorizing of poetry and lists of pairs 
of nonsense syllables, were the following: 


1. Learning by wholes. 

2. Use of active self-testing. 

3. Use of rhythm and of grouping. 

4. Attention to meaning and the advantage of picturing, or, 
depending upon the individual—otherwise symbolizing the meaning. 

5. Mental alertness and concentration. 

6. Confidence in ability to memorize. 

7. Use of secondary associations. 


Some of the methods which were taught may not be particularly 
valuable, but the experiment was not planned to determine their 
relative merit. 

It ; should be emphasized that neither group was given practice in 
the forms of mémorizing tested by any of the end-tests, with the excep- 
tion of the practice in memorizing poetry.) Methods were illustrated 
solely by the use of poetry and nonsense syllables, and practice was 
limited to the same materials. For example, after explaining what is 
meant by secondary associations, the experimenter wrote nonsense 
syllables on the black-board and pointed out numerous secondary 
associations which could be used to remember them in pairs. It was 
then explained that similar associations could readily be formed 
between a Turkish word and an English word and that such associa- 
tions should be used when memorizing a Turkish-English vocabulary. 
The group was given practice, however, only in the use of secondary 


=: 

: ti? 
oa 
™ 


= o 


ee ee ee 





= UR oe 


























166 The Journal of Educational Psychology 
TaBLE II.—PrRoGRAM OF PRACTICE PERIODS 
Period Practice group Training group 
I 20 minutes: memorizing poetry 7 minutes: listening to exposition 
of rules 
13 minutes: memorizing poetry 
II 25 minutes: memorizing poetry 7 minutes: listening to exposition 
of rules 
18 minutes: memorizing poetry 
Ill 28 minutes: memorizing nonsense | 28 minutes: listening to exposition 
syllables and _ illustration of 
rules 
IV | 20 minutes: memorizing nonsense; 5 minutes: listening to review of 
° syllables preceding period 
| 15 minutes: memorizing nonsense 
syllables 

V 19 minutes: memorizing nonsense 9 minutes: attending to “black- 

syllables board talk’’ on mean- 
ing of secondary as- 
sociations. 
10 minutes: memorizing nonsense 
syllables. 

VI 25 minutes: memorizing poetry 25 minutes: memorizing poetry 
Vil 20 minutes: memorizing poetry 20 minutes: memorizing poetry 
VIII | 20 minutes: memorizing nonsense | 20 minutes: listening to review of 

syllables methods, and the situ- 
ations in which to use 
them 
Total 
time. .| 177 minutes 177 minutes 








associations in memorizing pairs of nonsense syllables. 


Again, in 


illustrating the meaning of grouping, it was pointed out how, in the 
case of either nonsense syllables or a line of poetry, grouping could be 


secured by means of accent. 


It was then explained that when attemp- 


ting to remember a list of consonants given orally, this same sort of 











ii 
f 
I 
‘ 
\ 
I 
I 
1 
1 
1 
] 
f 








Effect of Type of Training upon Transference 167 


grouping should be employed. Such exercises as these do not con- 
stitute practice in memorizing either Turkish-English vocabularies or 
lists of consonants, and no such practice was given. Likewise, prac- 
tice in learning poetry by the whole-method, even though it be accom- 
panied by information that the same method should be used in the case 
of prose, is a far different thing from practice in memorizing prose. 
It is not alleged that the exercises used with the training group were as 
effective as would have been a few moments of practice in the exact 
forms of memorizing for which these exercises prepared. This direct 
practice was avoided, however, as the whole object of the experiment 
was to determine whether, without such direct practice, marked 
improvement in ability could yet be secured. 

The scores made on the end-tests by each of the three groups are 
summarized in Table III. This table shows the average score on both 
the initial and final forms of each end-tests. In the fifth column 
is given the difference between the average initial, and the average 
final score. It will be noticed that in two cases, namely, memory for 
prpse and memory for facts, this difference, at least for the control 


group, is negative. The explanation of these negative differences is _ 


very simple. In the case of poetry, it is due to the fact that the final 
poetry test required memorizing of a longer, and possibly harder, 
poem than the initial test. In the case of memory for facts, it is 
undoubtedly due to the circumstance that the second list of facts 
was scored in units (phases or ideas) which were not identical with 
those of the first list of facts and which yielded a smaller maximum 
possible score. These negative differences, or losses, in no wise hinder 
a comparison between the groups. For example, while the control 
group required 172 seconds longer to master the final poem than to 
learn the first one, the training group took only 57 seconds longer. 
In other words, the training group did better than the control group 
by 172 minus 57, or 115 seconds. 

A study of the entire column of differences between initial and final 
scores shows that the practice group sometimes improved more, and 
sometimes less, than the control group. In other words, the data 
indicate that the practice given this group sometimes helped, but in 
two cases interfered, with performance on the final tests. In the case 
of all the end-tests except one, however, the difference in improvement 
between the practice and control groups is small and statistically 
unreliable. Just how significant the differences are is shown in 
Table IV. The training group, on the other hand, shows for 


a ae = ne 





“Pee 
4 ,! Pe nd 











168 The Journal of Educational Psychology 


every end-test a decidedly greater improvement than either the 
practice or the control group. 

In the extreme right hand column of Table III, headed “ Per- 
centage of gain or loss,” is given the difference between final and 
initial scores, reckoned as a percentage of the initial score. These 
percentages permit of a comparison of the various groups which is less 


TaBLeE II].—D1IFFERENCES BETWEEN INITIAL AND FINAL PERFORMANCES ON THE 
END-TESTS 
N, Control Group = 106; Practice Group = 34; Training Group = 42. + 2s 
Units of Measurement, See Description of Tests 

















Initial | Final | 9 ppg 
Test Group | aver- | aver- | PEais;. ge 
ome om ences _— or 
Oss 

Control | 524 696 —172 \+11 —32.8 

Rote Peete. «... 6006s. Practice | 571 737 —166 \|+18 —29.1 
Training | 539 596 — 57 (+15 —10.6- 

Control | 637 454; {+183 |+11 +28 .7 

PER se ecccces tes Practice | 654 487, +167 (+18 +25:5 . 
Training | 731 | 361 \ |+370 /+22 +50 .7~ 

Control | 67.5 | 64.2 |=—8.3)+ 0.96) — 4.9 

Facts (Substance)... ... | Practice 64.0 | 61.0 |-— 3.0) + 1.68) — 4.7 
Training | 64.0 | 72.2 |+ 8.2) + 1.28; +12.8 

Control 7.6 9.8 }+ 2.2;+ 0.21) +29.0 

PCC. oi hunt aboard oe Practice 7.2 9.9 '+ 2.7|+ 0.38) +37.5 
| Training 6.5) 12.2 |+  5.7)+ 0.38) +87.7 

Control 16.2 16.1 |— O.1;+ 0.32; — 04 

be a ima (Turkish- || practice | 14.6| 15.1 |+ 0.5/4 0.51| + 3.4 
ng.) Training | 13.6| 21.1 |+ 7.5/+ 0.59} +55.2 
Control 6.6 7.0 |\+ 0.4;+ 0.05) + 6.7 

Span (Consonants)...... | Practice 7.0 6.6 |\— 0.4/+ 0.06) — 5.7« 
| | Training 6.4 7.7 + 1.3;}+ 0.10) +20.3 

















affected by the size of the initial scores than one based upon the 
absolute differences between final and initialscores. That thisis true was 
determined by sectioning the groups into those whose initial scores 
were above or below the average. The gains (or losses) shown by 
two such sections of a group usually differed little when calculated as 
a percentage of initial score. It should be observed, however, that 
the differences in the initial scores of the various sections are in no case 





ti 





Effect of Type of Training upon Transference 169 


large. Moreover, in all tests except “rote prose,” the-initial ability | 
of the training group happened to be lower than that of the practice | 


group. Consequently, it is all the more convincing that the scores 
made by the training group are, in all the final tests, much better 
than those of the practice group. (The average percentage of improve- 
ment in the end-tests, calculated from Table III, is, for the practice 
group, 4.5, and for the training group, 36.1.) Thus, the difference in 
the average percentage of gain of the two groups is 31.6. In 
the separate tests, this difference, which is always in favor of the train- 
ing group, varies from 17.5 (in memory for facts) to 51.8 (in the Turk- 


ish-English vocabulary test). A 


Any improvement, shown by either the practice or training groups 
in the final tests is, of course, due to two factors. One of these is the 
repetition in the final tests of the same kinds of memorizing as those 
required in the initial tests. The other is the effect of the exercises 
given between the initial and final tests. In order to compare the 
effects of the latter in the practice and training groups (which is the 
main object of the experiment), it is helpful first to subtract from their 
improvement the improvement due simply to repetition of the 
end-tests, shown by the control group. In this way, one obtains a 
reasonably accurate measure of the gain (or loss) shown by the prac- 
tice or training groups, in so far as this is due solely to the effect of the 
practice or training periods. The gains, or losses, of the practice and 
training groups, thus computed, that is, after algebraic subtraction 
of the gains or losses of the control group, are shown in Table IV. 


It will be seen from Table IV that, in all cases but two, the practice - 


given the practice group was beneficial. It appears to have been most 
beneficial in the case of memory for historical dates. The two excep- 
tions, 7.e., cases in which the practice periods resulted in a detraction 
of ability, are those of ro rote prose and auditory span for consonants. 
Only in the case of memory span, upon which the effect of the practice 
periods was detrimental, is the effect great enough to be very signif- 
icant. Noteworthy is the small improvement shown by the practice 
group in memorizing poetry—one of the forms of memorizing in which 
this group was given practice. Of course, the activities involved in 
memorizing any two poems are not the same, as is indicated by the fact 
that a coefficient of correlation of only +.67 was obtained between the 
initial and final poetry tests. And one may recall that the pioneer 
experiment of William James dealt with the question whether prac- 


tice in learning one poem would help in learning a different poem. : 


* 


- 
ed 
; 
‘ 











4 ~« Maes 
ee ROOT or Tey POR ae — Ta 


ee er 

















170 The Journal of Educational Psychology 


That it often does not is indicated, more reliably, than by the results 
of James himself, by the careful and widely cited study by Sleight. 
This investigator found that children who practiced learning poetry 
showed, not only no improvement, but actually a slight loss, in the 
poetry end-test, relative to the children of the control group, who were 
given no practice in memorizing.! In the present experiment, the final 
poetry test required the learning of a somewhat longer and probably 
harder poem than any used in the practice periods. In memorizing 
it the practice group showed an average gain over the control group 
of twenty-one seconds, a gain which is equal only to three-tenths of 
its probable error. Such a slight gain in one of the two forms of 
memorizing actually practiced suggests that the practice was of little 
value even for the purpose of memorizing the type of material with 
which the practice was conducted, and raises the question whether 
practice which fails to produce marked gain in the activity practiced 
could possibly show any considerable degree of transference. That 
it may do so is indicated by the results obtained by Sleight, which show 
that the group which practiced memorizing poetry fell slightly below 
the control group in the final poetry tests, but greatly excelled the 
control group (by a gain equal to six times its probable error) in the 
final tests of memory for nonsense syllables. 


TaBLE IV.—Gatns, IN THE END TESTS, OF THE PRACTICE AND TRAINING GROUPS, 
AFTER SUBTRACTION OF GAINS OF THE CoNnTROL GROUP; AND GAINS OF 
THE TRAINING GROUP AFTER SUBTRACTION OF GAINS OF THE 
Practice Group 

















Practice gain Training gain Training gain 
minus control minus control minus practice 
End-test 

Diff. | PE 'D/PE Diff. | PE | D/PE Diff. PE | D/PE 
Poetry....../+ 6 |+21 0.3 |+115 |+19 6.1 +109 +23 4.7 
Prose.......|—16 {+21 0.8 |+187 |+26 7.5 |+203 | +28 7.3 
Facts....... + 0.3)4 1.9) 0.2 |+ 11.5)+ 1.6) 7.2 |4+ 11.2/4 2.1) 5.3 
Dates....... + 0.5+ 0.44 13/)+ 3.554 0.44 8.8'+ 3.0:\+ 0.5) 6.0 
Vocab...... + 0.6.+ 0.6; 1.0 |+ 7.6)+ 0.7; 10.9 |+ 7.0;+ 0.8) 8.8 
GREK RS — 0.84 0.1) 8.0/+ 0.9'+ 0.1) 9.0)/+ 1.7/+ 0.1] 17.0 





























The training group shows results which stand in striking contrast 
with those obtained with the practice group. The improvement shown 





1Op. cit., p. 422. 





mu 
gal 
err 


dif 
ing 
dif 


she 


pr 
ot] 
fol 
te: 


gr 


tr: 








Effect of Type of Training upon Transference 171 


by the training group in the final tests is, in the case of every end-test, 
much greater than that of the control group. The difference in the 
gains of the two groups is in no case less than six times the probable 
error of the difference. The superiority of the gains of the training 
group over those of the practice group is hardly less pronounced. The 
differences in their gains, which are in ,all cases in favor of the train- 
ing group, vary from 4.7 to 17.0 times the probable errors of those 
differences. 

The facts given establish very definitely certain conclusions. They 
show unequivocally that the training group did much better than the 
practice group in every one of six end-tests, which were related to each 
other only to the extent of an average intercorrelation, after correction 


for attenuation, of 38.3 per cent. The percentage of gain in the end-' 


tests averaged 31.6 higher for the training group than for the control 
group. This greater gain on the part of the training group was 
obtained in spite of the fact that drill in memorizing was given to the 
training group with no other material than that employed with the 
practice group.. It was produced by using the drill material primarily 
as material with which to conduct practice in proper methods of mem- 
orizing and, further, by explaining these methods and calling attention 
to the ones which should be employed when new kinds of memoriz- 
ing were undertaken. In short, the experiment shows that in acase 
where one kind of training—undirected drill—produces amounts 
of transference which are sometimes positive and sometimes negative, 
but always small, another kind of training with the same drill material 
may result in a transference, the effects of which are uniformly large 
and positive. 

hile the investigation that has been described has taken the form 
of a laboratory experiment, it is hoped that it shows sufficient analogy 
to school-room situations to have a definite bearing upon education. 
For example, it might be found that the study of some particular sub- 
ject, such as algebra, produces little improvement in ability to reason 
about problems in other subjects. According to the present results, 
however, such findings would tell little with respect to the possible 
value of the aforesaid subject as a medium for the giving of drill in 
the technique of reasoning, or of the results which could be obtained 
by its study under a teacher who so used it. Naturally, such a teacher 
would need to know both the proper technique of reasoning and how 
to illustrate the use of this technique in many widely different fields. 


To determine the amount of transference of ability that did result ») 






































wag — aap 


eat Un oe eS 
— ies y 








172 The Journal of Educational Psychology 


|| from the study of a particular subject is one thing, to determine the 
\\ amount of transference which might be secured from the study 

of that same subject is a different and far more difficult thing. The 
investigation of problems of the latter sort must constitute an impor- 
tant part of the program for establishing a scientific basis for methods 
of teaching. 


\ 


— pm AFA 


———SE ——=— Sl 





THE ACCURACY OF INTELLIGENCE QUOTIENTS 
FROM PAIRS OF GROUP TESTS IN THE JUNIOR 
HIGH SCHOOL 


FOWLER D. BROOKS 
Johns Hopkins University 


In measuring intelligence we often use two group tests instead of 
one, upon the very reasonable assumption that by so doing we increase 
the accuracy or reliability of results, although few data are available 
showing exactly how much the precision of measurement is increased 
by using two tests instead of one. 

The accuracy of IQ from a pair of group tests may refer to its 
precision (1) as a measure of an individual’s intelligence, (2) as a 
basis for dividing a class into sections homogeneous in respect to IQ, 
(3) in predicting an individual’s future scholastic or other success, or 
(4) as a basis for sectioning upon predicted or fyture scholastic success. ' 

In this paper I am attempting to give tentative answers to the 
following questions: 

1. How accurately do two group intelligence tests measure a 
junior high school pupil’s IQ, and which of the pairs of tests yield the 
most accurate results? 

2. How much more accurate are the combined results from two 
tests than the results from one? 

3. How closely does sectioning upon mean IQ from two group tests 
agree with sectioning upon criterion IQ? 

4. How much more accurate sectioning is secured from two tests 
than from one? 

The Stanford-Binet and the nine group tests in the following order 
were given to each of the 108 pupils entering a junior high school in 
Baltimore. The writer gave the Stanford-Binet and about one-half 
of the group tests, the remainder being given by a person trained under 
supervision to give them. Testing conditions were good. The 
, testing program covered a period of six weeks. 

Miller, Form A. 
Otis, Self-Administering (30 min.), Form A. 





1Qn the predictive value of each intelligence test used in this investigation 
together with that of certain achievement tests and other measures, see Journal 
of Educational Research, Vol. XII (December, 1925), pp. 359-369; on the accu- 
racy of each test as a measure of a pupil’s mental age and IQ and as a basis for 
three-fold sectioning according to each of three criteria of intelligence, see School 
Review, Vol. XXXIV (May, 1926), pp. 333-342. 
173 





we i ere + ass 
= a 
2 Be % pee 


SBT Be 
~ sage 


ee 
gf =" 














174 The Journal of Educational Psychology 


Illinois, Form A. 
Terman, Form A. 
Haggerty, Delta 2. 
National, Al. 
Dearborn, Revised C. 
Dearborn, Revised D. 
Pintner, Non-language. 


THE CRITERIA OF A Pupit’s IQ 


What is the best criterion of the intelligence of these Grade VII 
pupils? I do not know, but from available data I have formulated 
three criteria. 

As a first criterion of a child’s IQ I have taken the mean of his 
Stanford-Binet and his nine group-test IQ’s weighting Stanford-Binet 
9 and each group test 1. The weighting is purely arbitrary, being 
suggested in part by differences in reliability, types of test elements, 
etc. Undoubtedly as far as reliability is concerned I have given 
Stanford-Binet all the weight it should have, because it implies 
that a Stanford-Binet rating is as reliable as the mean of nine group- 
test ratings.! 

As a second criterion I have weighted Stanford-Binet 5, each 
group test 1, and have given a weight of 1 to the mean of the IQ ratings 
assigned by three junior high school teachers who had taught the 
group five months. The Stanford-Binet was here purposely given 
less weight than in the first criterion upon the assumption that five 
group tests give as reliable and valid an estimate of a child’s IQ as 
the Stanford-Binet. 

The Stanford-Binet IQ’s form the third criterion. 

Obviously the validity of any conclusions depends upon the validity 
and reliability of the criteria. A defect in the first two criteria is 
that each group test is not evaluated by comparison with an outside 
criterion, but with a criterion of which it constitutes one-eighteenth 
and one-fifteenth, respectively. An outside, comprehensive criterion 
would be better. On the other hand, the combinations I have used 
undoubtedly have some advantage over the Stanford-Binet alone. 





1 The mean of the 36 intercorrelations of group-test IQ’s is .755, or for con- 
stant chronological age it is .545. Application of Brown’s formula for coefficient 


of reliability, r, = = 





indicates that the combined nine tests probably 


1+(n—1)r’ 
are as reliable as the Stanford-Binet. 





oso mt; S&S iS <A 


nm TM 


>, 


ee ee ee ee ee ae «ee 


a | 





Accuracy of Intelligence Quotients 175 


It was hoped that the second criterion would prove of distinct 
value, but apparently it doesnot. The simple correlation between it 
and the first criterion is .9951. Since IQ’s are indexes, chronological 
age being a common element, this coefficient may be too high.! The 
amount of such spurious index correlation may be eliminated by the 
partial correlation technique. Rendering chronological age constant, 
the first and second criteria still correlate .9941. Accordingly the 
second criterion was dropped from further use although it correlates 
slightly less with the third criterion than does the first. 


Accuracy OF IQ’s rrom Pairs oF Group TEsTs AS MEASURES OF AN 
INDIVIDUAL’s INTELLIGENCE 


I have found the difference between each pupil’s IQ on each pair 
of group tests and his IQ on each criterion. These differences 
are error scores, and are all treated as positive quantities. The 
relative accuracy of IQ’s from the 36 pairs of tests is represented by 
the relative magnitude of these error scores. The median and semi- 
interquartile range of these error scores are shown in Table II, 
the pairs of tests arranged in order of size of median errors according 
to the first criterion. In Table I similar data are shown for each group 
test, the tests being arranged in the order in which they were given. 


TasBLeE I.—DIFFERENCES BETWEEN Grovup-TEST I1Q’s AND CRITERION I1Q’s oF 
108 First-year Junior HicH ScuHoo.t Pupiis 











| First criterion Third criterion 
Group test | Bane 4 — 
Median | Q | Median | Q 
| | 
ne ce 9.00 | 5.47 | 13.67 7.75 
Otis. . 7.89 4.57 | 11.00 6.55 
NERA Ey ere nee 5.89 | 2.86 | 8.10 | 4.00 
Terman........... Sa as 5.50 | 3.20 | 7.86 4.97 
hs ds 94.076 wu ok Seb 44S 44 ‘. 5.33 | 3.14 7.75 4.87 
ee ocanwe 5.57 | 3.05 8.44 5.36 
RI SARS pS aia ane 6.00 | 3.88 | 6.90 | 4.92 
ESSERE, ae a oo Se) oe 9.33 5.50 
ncn dca's svn Ccevecsccavvsseesss) MECC) CECE’. Oe 
Stel I Sale 2G oA | 7.07 | 4.01 9.78 5.79 











1See K. Pearson, Proceedings of Royal Society, London, 1897, Vol. LX, pp. 
489-498; Thomson and Pintner, Journal of Educational Psychology, 1924, Vol. 
XV, pp. 433-444. 


rs ans 
_ 


es? 


| 
/ 
i 
: 


Ot eal 


cae, il meme ars tas Se a ital . : . se pect 


Pee Fee 


Acar ie an 


+A 


ue 


~~ ed ee 


+ ers 





Se anaes sna 

















176 The Journal of Educational Psychology 


Tas_p I].—DirFERENCES BETWEEN IQ’s From Pairs or Group TESTS AND 


CriTERION IQ’s or 108 First-year Junior HicH Scuoou Pvupiis 





Pair of tests 


First criterion 


Third criterion 








Median Q Median Q 

Ce, so. cscedsvewcddewews 3.46 2.08 6.72 4.08 
REIGNS... cc tcc cle wcvbwccws | 3.75 2.19 7.00 4.20 
Dearborn C-Terman..............00.0-. | 3.83 2.21 7.20 4.72 
Ns id es oak ie wie wee a 3.94 2.30 7.29 4.34 
 EEPEE EEL OPE ET OT TE 4.00 2.47 7.75 4.17 
Dearborn D-Haggerty................... 4.07 3.21 8.18 | 4.64 
Dearborn D-Illinois....................-. 4.17 2.06 7.89 | 4.23 
National A-Termam..........ccccccceces 4.20 2.71 7.60 4.97 
ES ee | 4.30 2.73 8.00 4.39 
Haggerty-National A................... 4.31 2.50 8.00 4.36 
Dearborn C-Haggerty................... 4.38 2.52 7.28 4.20 
Dearborn C-Dearborn D................. 4.44 2.99 7.50 4.35 
Dearborn C-National A................. 4.50 2.87 8.15 4.86 
Dearborn D-Terman................-.-. 4.69 2.85 9.00 4.73 
is 6 so a ing we ReS 4.75 2.79 9.00 4.25 
Dearborn D-National A................. 5.17 2.66 8.57 4.20 
ee i Ue the eee cubwuad 5.33 2.78 9.00 5.12 
OP es Pre Pree Pee eee 5.56 3.64 9.50 5.40 
ag ot ee 5.67 2.75 8.50 4.89 
a odd eee w eis eaeeewe 5.69 3.21 8.38 4.96 
eS 5.71 3.65 9.33 5.45 
rc asedeen wee aaes 5.75 2.90 9.20 5.35 
rs iG bb beak seeanaee 5.82 3.33 9.70 4.54 
ee i cca Ciné ecane aes 6.00 3.67 9.40 5.55 
a. a ae ebade Wenner es 6.00 3.71 8.92 5.97 
Dearborn D-Miller....................+- 6.13 3.97 10.56 5.90 
cs ae 6.38 2.80 10.40 5.10 
RS be. bs epee wees 6 .67 3.81 | 10.25 5.93 
nh ones ae eek e ee 6.70 3.98 9.25 5.67 
Biommorty-Pimtmer....... 0... ccc cess ceses 7.08 3.55 10.00 5.06 
Dearborn D-Pintner..................-- 7.29 4.57 10.00 5.87 
National A-Pintner...................-- 7.63 4.17 11.00 5.75 
Miller-Otis......... 00.2.0... ceeeceeeee. | 7.67 | 4.59 | 11.60 | 6.53 
aa Si iis te ey silt iat a 8.00 4.17 11.33 5.83 
i dle so edinewnen waa | 8.00 4.59 12.50 5.83 
Rentborn C-Pimtmer. .......06scccccceees 8.13 3.67 10.33 6.04 

Te Raced aia iia ee oo lee 5.53 3.18 9.01 5.04 

















From these tables it appears that according to the first criterion 
50 per cent of the IQ’s from a group test are, on the average, in error 








Seem oF 


ee ct 


—“* @® Rh © et 


~~ wo Cy, ee oo 








Accuracy of Intelligence Quotients 177 


7.07 points or more, and that 50 per cent of the I1Q’s from a pair of 
group tests are, on the average, in error 5.53 points or more; that is, 
that using two group tests instead of one reduces the median error 
approximately one and one-half points, on the average, and the semi- 
interquartile range less than one point (from 4.01 to 3.18). According 
to the third criterion the median error of IQ averages 9.78 points for 
one test, and 9.01 points for two tests, whereas the Q’s of the errors 
are 5.79 and 5.04 points, respectively. Combining the IQ’s from two 
group intelligence tests reduces the errors or inaccuracies approxi- 
mately 21 per cent according to the first criterion, and 10 per cent 
according to the third criterion. 

This is a significant reduction. But of even greater significance is 
the fact that the median error from some pairs of tests is as large or 
larger than that from one of the more accurate group tests by itself; 
e.g., according to the first criterion the median error of the Haggerty, 
Terman, Illinois, or National A is less than 6 points, whereas 13 pairs 
of group tests yield larger median errors. The Q of the errors of the 
same four tests is 3.2 or less, whereas 17 pairs of tests have larger Q’s. 

According to the third criterion the four most accurate tests (Dear- 
born C, Haggerty, Terman, and Illinois) have median errors of 8.10 
points or less, and Q’s of 4.97 points or less. Twenty-five pairs of 
tests have greater median errors and 17 pairs have larger Q’s. The 
median error of the most accurate test, Dearborn C, is smaller than 
that from any of the remaining 35 pairs of tests, while its Q is less 
than that of 19 pairs. 

If then we measure the accuracy of tests by the median and Q 
of the differences between 1Q’s obtained from them and the criterion 
1Q’s, it appears that two tests do not always measure a child’s IQ more 
accurately than one test alone, if we merely take the mean of his IQ’s 
from the two group tests, i.e., give both tests equal weight. Further 
examination of these data show that according to both criteria combin- 
ing Pintner 1Q’s with those from each of the other eight tests gives 
results less accurate than those from any one of the four most accurate 
tests alone; that of the eight pairs involving Miller four pairs according 
to the first criterion and seven pairs according to the third criterion are 
less accurate than any one of the four most accurate tests alone. 
According to the third criterion the accuracy\of Dearborn C is not in- 
creased by combining it with any one of the other eight tests. 

In Table III are shown for each test the tests which increase its 
accuracy according to each criterion. 


I 


| 
} 
% 

‘9 

; 


i 


" Ss 3 F 


Panacea 


Se we RO A PS Pe esa ee 
4 A 


oe 
——— 


—s 


. —— -- 
PE LO ae A <n =—-F 
> oe roa : 











178 The Journal of Educational Psychology 


TaB_LeE III.—Tersts Wuicu INCREASE THE AccuRACY OF OTHER TESTS WHEN THE 
Two ARE CoMBINED BY TAKING THE MEAN oF THEIR IQ’s! 








Test First criterion Third criterion 
Miller..........| Any one of the other eight. Any one of the other eight. 
OP Any one except Pintner. Any one except Miller or 

Pintner. 
Illinois.........| Any one except Pintner. Any one except Miller, Otis, or 
Pintner. 
Terman........| Any one except Miller or | Dearborn C, Haggerty, Illinois, 
Pintner. or National A. 
Haggerty.......| Any one except Miller or| Dearborn C or Illinois. 
Pintner. 
National A.....) Any one except Miller, Otis, or | Dearborn C, Haggerty, Illinois, 
Pintner. or Terman. 
Dearborn C..... Any one except Pintner. None of the other eight. 
Dearborn D....| Any one except Pintner. Any one except Miller or 
Pintner. 
Pintner........ .| Any one of the other eight. Any one of the other eight. 











1 This table reads as follows: According to the first criterion any one of the 
eight tests used in this investigation increases the accuracy of the Miller when it 
and Miller are combined; any of the other eight tests except Pintner increases 
the accuracy of the Otis test, etc. 


It also appears that Haggerty-Illinois, Dearborn C-Illinois, and 
Dearborn C-Terman are the three most accurate pairs of tests according 
to either criterion, the median error being less than 4 points of IQ 
and the Q less than 2!4 points according to the first criterion, and 
less than 7.2 and 4.7 points of IQ, respectively, according to the third 
criterion. Pintner and Miller give the least accurate measure of a 
pupil’s IQ according to our data. 


Tue Accuracy oF 1Q’s From Pairs oF Group TESTS AS THE Basis 
FOR SECTIONING ACCORDING TO INTELLIGENCE 


I have tried to determine the relative accuracy of the IQ’s from the 
36 pairs of tests as a basis for dividing a class into sections homo- 
geneous in respect to intelligence. I have used two methods. 

1. The simple multiple correlations and the partial multiple corre- 
lations (chronological age constant) between each criterion and the 
1Q’s from each of the 36 pairs of group tests. 

2. The agreement of four-fold sectioning on each criterion with 
similar sectioning on the mean IQ from each of the 36 pairs of tests. 





t 


ao < 


> oo» «&¢t -« Ff fF FF 


- fee Be eaetlhClCUO 








Accuracy of Intelligence Quotients 179 


The multiple-correlation technique weights each test so as to give 
the best additive combination of it with every other one for the cri- 
terion employed, although in most cases we do not know in advance 
just what weights to use. Actual sectioning follows the plan often 
used when two group tests are combined by giving the IQ’s from each 
test a weight of one. 

Since we are interested in the size of the multiple correlations apart 
from any spurious correlation due to the chronological-age distribution 
of the group tested, I have stripped all multiples of any such effect 
of chronological age by rendering it constant. These multiple correla- 
tion coefficients are shown in Table V, arranged in order of magnitude 
with first criterion. Table IV shows the partials for each group test 
alone, arranged, however, in the order in which the tests were given. 
TaBLE IV.—ParTIAL COEFFICIENTS OF CORRELATION (CHRONOLOGICAL AGE 


CoNSTANT) BETWEEN IQ’s From Eacu Group TEST AND FROM FIRST AND 
Tuirp CRITERIA OF INTELLIGENCE. N = 108 








Test First criterion | Third criterion 
ES vos as tabu ke sweeten cevce. ny: .736 .500 
ee me ue pith .885 .689 
| RS RO A ee apt .738 547 
er, eae - oan .796 .681 
CS ie bcs ae awd 6 ¥0.0.0% Py eos .793 .695 
National A...... Ls on, 2 ee A ne .775 .628 
oe Ss Gas Kula kh une ous .761 .640 
i ee ee ae 2 .737 .575 
a ee a a tet .460 .348 








Twenty-seven of the pairs of tests yield multiple correlations with 
first criterion which are less than the highest correlation between a 
single test (Otis) and first criterion. Fifteen pairs of tests do not 
correlate as closely with the third criterion as does either the Otis or 
Haggerty: That is, if we combine these tests two at a time, using the 
best weights we can find, at the same time partialling out chronological 
age, the Otis is more like the first criterion than 27 of the 36 combina- 
tions, and the Otis or Haggerty is more like the Stanford-Binet than 
15 of the 36 combinations. Approximately the same thing is true 
of the simple and multiple correlations, 7.e., if chronological age is 
not partialled out. Are we then sure that two group tests combined 
are more accurate than one? May it not depend a great deal upon 
which two and which one are compared? Manifestly here is a problem 
needing further investigation. 







































180 The Journal of Educational Psychology 


TasLE V.—MULTIPLE COEFFICIENTS OF CORRELATION, CHRONOLOGICAL AGE 
Constant. N = 108 














Pair of tests First criterion | Third criterion 
TEE Er eee Saeed | .975 .800 
EE ee ae ee .948 .749 
EEE SE TE ee .938 .760 
SS | SPE OPE TEE TIE E .932 .725 
SS, OLE LE ee .930 .749 
i yA RAR Ir Le So RD .925 711 
let cea oy dae bad ven tenbews .913 .708 
eae ey et a 911 .654 
National A and Termam.............cesccece. .901 .727 
Dearborn C and Haggerty.................... .882 .759 
Dearvorn C and Terman.............sccececes .871 .740 
eA ee ere .869 .752 
I AS 6d sb dled. liad fk .864 .705 
i i ee nds path bake ew hbabee hes .864 . 706 
Dearborn D and Terman..................... .862 .714 
I I oo odbc bob b owe dbeaw ens .862 .716 
Sg co Pte Lo eos eeabe .853 .688 
Dearborn C and Illinois...................... 851 .681 
Dearborn C and National A.................. .851 .702 
Haggerty and National A.................... .850 22 
Pintner amd Termam...............sccc00-; 7 | .850 .712 
en rere .844 .660 
rr ee .843 .644 
Dearborn C and Dearborn D.................. .834 .680 
Dearborn D and National A.................. .831 .663 
ns ceases ce eseaeees .831 .633 
Dearborn D and Haggerty....................| .830 .704 
Haggerty and Pintner................0.000208] .828 711 
Dearborn D and Miller.................0.-05. | 825 .607 
Hilimole and National A...........sccccccccces | 811 .638 
National A and Pimtmer...........0.ccccecess 811 .649 
EE See .799 .571 
I 0 Ga gid walle Wibekw beeeennh .779 .544 
measpoen C amd Pimtmer........ 00sec cccceves .778 .646 
Dearborn D and Pintner.................0.:: .768 .596 
ek lie a alt .768 .571 











It is noteworthy that the Otis, apart from chronological age, corre- 
lates .885 with the first criterion, and that the eight highest multiple 
coefficients are obtained by combining Otis with each of the remaining 
eight tests, the coefficients ranging from .911 to .975. Furthermore, 
Pintner or Dearborn D is in ten of the lowest third of the multiple 











N 
0 
I] 
T 
H 
N 
[ 
L 
P 
N 


~~“ nas "5 Me DM OD 








Accuracy of Intelligence Quotients 181 


correlations with first criterion. Otis, combined with Haggerty, 
National A, Terman, Dearborn C, or Dearborn D, gives five of the 
nine highest multiple coefficients with the third criterion, whereas 
Miller or Pintner is found in nearly all of the lowest third. 

Four-fold sectioning upon IQ’s from one group test gives, on the 
average, 61.5 per cent correct classification according to the first 
criterion (pupil placed in same section by group-test IQ as by criterion), 
34.7 per cent placed one section too low or high, 3.7 per cent displaced 
two sections, and .1 per cent placed three sections too low or high. 
(See Tables VI and VII.) Displacement of one section is not neces- 
sarily a very serious error, since the pupil may belong very near the 
dividing line, but displacement of two or three sections is a serious 
error, the latter meaning that a top-section pupil is placed in the low- 
est section, or vice versa. 


TaBLE VI.—AGREEMENT OF FoUR-FOLD SECTIONING ON IQ’s From OnE Group 
Test witH SECTIONING ON CRITERION IQ’s. N = 108 





First criterion Third criterion 





Per cent} Per cent displaced || Percent) Per cent displaced 
































Test cor- cor- 
rectly rectly 
sec- 1 Sec- |2 Sec-|3Sec-|| gece | 1 Sec- | 2 Sec- | 3 Sec- 
tioned | tion | tions | tions || tioned | tion | tions | tions 
Mi Ginwasese .| 58.3 | 38.9/ 2.8] 0.0 || 48.2 | 41.7} 9.2] 0.9 
rere 63.9 | 31.5 | 4.6| 0.0 55.6 | 39.8) 3.7) 0.9 
Illinois..........| 63.9 | 33.3 2.8; 0.0 49.1 44.4); 6.5) 0.0 
Terman.........| 63.9 | 34.3) 0.9; 0.9]) 51.9 | 44.4] 3.7] 0.0 
Haggerty........ 66.7 | 31.5| 1.8} 0.0]} 60.2 | 37.0] 2.8] 0.0 
National A......| 63.0 | 35.2} 1.8) 0.0 52.8 | 40.7| 6.5| 0.0 
Dearborn C..... 65.8 | 33.3; 0.9); 0.0 50.9 | 48.2; 0.9; 0.0 
Dearborn D..... 57.4 | 38.9| 3.7) 0.0 55.6 | 35.2} 9.2] 0.0 
Bes cececen 50.9 | 35.2 | 13.9| 0.0 || 47.2 | 35.2 | 17.6) 0.0 
Mean 61.5 | 34.7) 3.7| 0.1 52.4 | 40.7| 6.7] 0.2 














Sectioning on mean IQ from two group tests gives, on the average, 
68.9 per cent correct classification, 30.0 per cent displacement of one 
section, 1.1 per cent displacement of two sections, and no displace- 
ment of three sections, according to the first criterion: that is, using 
a second group test increases the accuracy of classification on one test 
as follows: Correct classification, 7.4 per cent; displacement by one, 
two, or three sections, decreases of 4.7 per cent, 2.6 per cent and 0.1 






























182 The Journal of Educational Psychology 


per cent, respectively. Haggerty gives the most accurate sectioning 
on the first criterion, the percentages being 66.7, 31.5, 1.8, and 0.0, 
respectively. Using Dearborn C with Haggerty increases accuracy of 
sectioning to 78.7 per cent, 20.4 per cent, 0.9 per cent, and 0.0 per 
cent, respectively, while Illinois adds almost as much to it. Haggerty 
by itself gives more accurate sectioning on first criterion than the 9 
least accurate pairs of tests. Dearborn C or Illinois each gives more 
accurate sectioning alone than when combined with Pintner. Hag- 
‘ gerty or Otis alone is as accurate as when combined with Pintner. 

According to the third criterion one group test gives, on the aver- 
age, 52.4 per cent correct classification, 40.7 per cent displacement by 
one section, 6.7 per cent displacement by two sections, and 0.2 per 
cent displacement by three sections. Using a second group test 
increases the accuracy of classification on the average to 57.6 per cent, 
37.9 per cent, 4.4 per cent, and 0.1 per cent, respectively. Haggerty 
is the most accurate also according to the third criterion, the percent- 
ages being 60.2, 37.0, 2.8, and 0.0, respectively. Using Illinois or 
National A with Haggerty increases the accuracy of classification to 
64.8 per cent, 31.5 per cent, 3.7 per cent, and 0.0 per cent, respectively. 
Haggerty alone gives more accurate classification on Stanford-Binet 
than 27 of the 36 pairs of tests. Dearborn C, Dearborn D, Haggerty, 
or Otis is more accurate alone than in combination with Pintner. 
Dearborn D, Haggerty, or Otis is more accurate alone than in combina- 
tion with Miller. Miller or Illinois adds little to the accuracy of sec- 
tioning by National A. 

Dearborn C-Haggerty, Dearborn C-Illinois, Dearborn C-Terman, 
and Haggerty-Illinois give the most accurate sectioning on the first 
criterion, the percentages averaging 77.1, 22.7, 0.2, and 0.0, respec- 
tively. One-half of the pairs of tests give 70 per cent or more correct 
classification. No pair of tests shows a displacement of three sec- 
tions on first criterion, and 12 pairs show no displacement of two 
sections. When used alone Pintner, Miller, and Dearborn D are 
the least accurate for sectioning on this criterion, and one of these 
tests is found in 15 of the 18 least accurate pairs of tests. 

On the third criterion Haggerty-Illinois, Haggerty-National A, 
and Haggerty-Terman give the most accurate sectioning, the percent- 
ages averaging 64.5, 32.1, 3.4, and 0.0, respectively. Ten pairs of tests 
give 60 per cent or more correct classification. Just three pairs of 
tests show displacement by three sections, but only two pairs show no 
displacement by two sections. When used alone, Pintner, Miller, 





ee ee ee! ee Le ee ee ee ee, ee ee ee ee Le ee, ee ee eh! ee ee eee. eh hue, he he. hc, he 





~~ 
~ 
S 
= 
wr 
©) 
= 
8) 
= 
: 
— 
> 
=> 
E 
5 
© 
x 


N = 108 


TaBLeE VII.—AGREEMENT oF Four-FOLD SECTIONING ON IQ’s From Parrs oF 
Group TEsTs WITH SECTIONING ON CRITERION IQ’s. 


Third criterion 


SUOI}IIg § 


ceocooeoocooeoocoeceoseooocooooooocooeeoeeoecoeeseeson 
Ssesooseoooeososososcosessososoeososoosososososeososoosoooscosoosco 





su0r}I0g Z 


SHE SCR SHHHKR HER HSHHHHOSOHSOORHN ARENA GOCHHS 
SCH MOMONNAMAMMOHAAMMONMOAWMTMNO MH HWMOHMOMOMAOA wth OW + 





017998 T 





msde Ban Hac dy< Lace ea a dhe orl He. en Ss St be Pach cee chin es So di i 
ryR2qger 
+ OO 


BSSLSSSSSSESSSESISSSESSSSESESRE 





13 


SMHODANUMWMWAOANUMOGOOCHMNOHEMMMONOHADAHOMNDMHOAONMNANS 


SLBESLSSSLSES 


BSSSSR8Se 


wt CO ft 
ww uO 


22 
cee B 


SSISISSSSSSSuse 








First criterion 


SUOT}IIS € 





sescesoooososooscosooosoosososooososssooosossesecs|es 
SsoeosoeoeosooSooSsoSsoSoossoSoSsoSsoscooSeososooooosoooSoooscsS 








sUOI}Iag Z 


SPOSOSSSSSHAAASARSSSSOSONAOCHHDAAWDWGODWDHDHDDNHWDA 
SSSSSOSSHSSSSSSSSOSOH OOH HHH OR HH RRB HNN OR 











worz09g T 


FAAADDOANDHDHDDHDOOOOO NON OOOO NHODMIMAANNONS 
“ is ho Be Onoda HH Omnntonnunavo 
SASAHHHSSSHHSSHSSSSSASSSSSSSSSRSS 








s 











OD 60 OD OO OD “OD 
MOAANANAAMMMOMNOH HHH HM AMMOM MM NOM OKHBDDNDASOONAGS 


SRESRARRRRRRRSSLSELSSSSSSSSSSSESSSSSSRE 








Dearborn C-Pintner. . 
Mean.... 


Dearborn D-Miller........... 
[linoie-Pintner............... 
Dearborn D-Pintner... 


Miller-Pintner. . 
Otis-Pintner.... 


Haggerty-Pintner. .. 


National A-Pintner. . 


National A-Terman.... 
Pintner-Terman. . 
Illinois-Miller..... 


Dearborn D'Teeman... | | 
Dearborn D-Haggerty . | : | | | , 
Dearborn D-National A....... 
Miller-Otis....... 


Haggerty-Terman. . 

Dearborn D-Otis.. . 

Dearborn C-Otis. . 

National A-Otis. . 
Miller-Terman.......... 
Dearborn D-Illinois... . 
Dearborn C-Miller........._.. 
Dearborn C-National A.... .. 
Illinois-National A.... 
Miller-National A... 
Illinois—Otis...... 
Haggerty-Miller..... 


Haggerty-Illinois....... 
Dearborn C-Terman.......... 
Haggerty-National A......... 
Haggerty-Otis. . . 
Illinois-Terman.... 
Otis-Terman.... 


Dearborn C-Haggerty.. 
Dearborn C-Illinois..... 





184 The Journal of Educational Psychology 


and Illinois give the least accurate sectioning according to the Stan- 
ford-Binet criterion; Miller or Pintner is found in 12 of the 16 least 
accurate pairs of tests. 

The significance of such accuracy figures may be seen by a study of 
Tables VI, VII, and VIII. 1Q’s from one or two group tests are, on 
the average, superior to chronological age, sixth-grade marks, or fifth- 
and sixth-grade marks as a basis for sectioning on the first criterion 


Tasie VIII.—Accuracy or Four-FoLp SECTIONING ON Various Bases Accorp- 
ING TO First AND THrIrRD CRITERIA oF IQ 



































} 
: First criterion | Third criterion 
| 
Basis of | Per cent Per cent displaced | Per cent Per cent displaced 
sectioning cor- cor- 
rectly | rectly 
sec- | 1Sec-|2Sec-|3Sec-|| sec- | 1 Sec-| 2Sec-| 3 Sec- 
tioned | tion | tions | tions || tioned | tion | tions | tions 
Chance......... 25.0 | 37.5 | 25.0 | 12.5 | 
Chronologicalage, 50.0 | 38.9; 11.1 | 0.0 | 51.9 | 29.6) 18.5 0.0 
6th grade marks.| 43.5 42.6 | 12.0 1.8 42.6 40.7 | 14.8 1.8 
5th and 6th grade | 
Rare 48.1 | 40.7] 9.3| 1.8 || 44.4 | 40.7 | 13.0 | 1.8 
One group test...) 61.5 | 34.7| 3.7; 0.1 || 52.4 | 40.7| 6.7; 0.2 
Two group tests.| 68.9 | 30.0; 1.1) 0.0 || 57.6 | 87.9 | 4.4] 0.1 

















IQ’s. It is remarkable that sectioning by chronological age (placing 
youngest fourth in top section, etc.) gives about as good classifica- 
tion according to Stanford-Binet IQ as classification upon IQ from 
Miller, Illinois, Pintner, or Dearborn C and Pintner combined. 

The superiority of the group-test IQ over other bases of sectioning 
is more marked in four-fold sectioning than in three-fold sectioning on 
IQ. This may mean that the group intelligence test becomes increas- 
ingly valuable when a finer differentiation of ability is desired. Fur- 
ther research on this problem is needed. 

Unfortunately we do not know in advance in any case just what 
are the best weights to use in combining IQ’s from two group tests. 
We have determined a few of the partial regression coefficients from 
our data. They vary from approximately equal weight for each test 
of a pair to weights in the ratio of 3 and 1 for Otis and Pintner on first 
criterion. That the relative accuracy of IQ’s from pairs of tests as 








in 
ac 
te 


at 
cE 








Accuracy of Intelligence Quotients 185 


determined by the multiple correlations with criterion is not the same 
as their relative accuracy as determined by the comparison of accuracy 
of sectioning upon them with that upon criterion is shown by the fact 
that the correlation between the two accuracy ranks of the 36 pairs 
of tests is p = .630 in case of first criterion, and p = .569 in case of the 
Stanford-Binet. The ranks of the pairs of tests in accuracy of sec- 
tioning and in accuracy of measuring individuals have a correlation of 
p = .701 for the first criterion, and of p = .594 for the third crite- 
rion. The ranks of the pairs of tests as to multiple correlations with 
criterion and as measures of individuals have correlations of p = .255, 
and p = .398 for the two criteria, respectively. 


CONCLUSIONS 


1. On the average, the IQ’s from any two of the group tests used 
in this study have a median error of 5.53 points, and a Q of 3.18 points 
according to the first criterion, and a median error of 9.01 points and 
a Q of 5.04 points according to the Stanford-Binet criterion. Two 
tests measure an individual’s IQ approximately 21 per cent more 
accurately than one test according to the first criterion, and 10 per 
cent more accurately according to the third. 

2. Illinois, combined with Haggerty or Dearborn C, and Dearborn 
C combined with Terman, give the most accurate measures of an 
individual’s IQ according to either criterion, the median error being 
less than 4 points with a Q of 214 points on first criterion, and around 
7 and 5 points, respectively, on the third criterion. Miller or Pintner 
is found in nearly all of the least accurate third of the pairs of tests, 
according to both criteria. 

3. Using any two of these tests and combining IQ’s by weighting 
each test one, does not invariably give a more accurate measure of an 
individual than one test alone, because on the first criterion one-third 
of the pairs of tests are less accurate than Haggerty, Terman, Illinois, 
or National A alone, whereas on the third criterion two-thirds of the 
pairs of tests measure individuals less accurately than Dearborn C, 
Haggerty, Terman, or Illinois alone. 

4. Four-fold sectioning on [Q’s from a pair of group tests gives, on 
the average, 68.9 per cent correct classification according to the first 
criterion of IQ, 30.0 per cent placed one section too low or high, 1.1 
per cent placed two sections too low or high, and no displacement by 
three sections. The percentages for one test alone are 61.5, 34.7, 
3.7, and 0.1, respectively. According to the Stanford-Binet criterion 



























— a 
een tee ee 


— . pon RTS 
a 4 SI 


Re ET ae Oy ee? Se 
- Ps - — 


pentane neers 
ee 
4 


OTS OT REL 
~ 





186 The Journal of Educational Psychology 


two group tests given an average correct classification of 57.6 per 
cent, and displacements by one, two, or three sections of 37.9 per cent, 
4.4 per cent, and 0.1 per cent, respectively; the percentages for one 
test alone average, 52.4, 40.7, 6.7, and 0.2, respectively. 

5. The IQ’s from one-fourth of the pairs of these group tests show 
less accurate sectioning according to the first criterion than that from 
Haggerty alone. The three least accurate tests (Dearborn D, Miller, 
and Pintner) help form nearly all of the less accurate 50 per cent of 
the pairs of tests. Using Stanford-Binet IQ’s as criterion, we find 
‘ that Haggerty alone gives more accurate classification than 27 of 
the 36 pairs of tests. Pintner and Miller are again the least accurate, 
and help form 12 of the 16 least accurate pairs of tests. Our data 
indicate that combining the results from two group tests does not 
always insure more accurate classification according to intelligence 
than would be secured from using one test alone. 

6. I1Q’s from one or two group tests are a more accurate basis for 
sectioning on IQ than chronological age, sixth-grade marks, or fifth- 
and sixth-grade marks, their superiority apparently being greater 
when a finer gradation of ability is sought. According to the Stan- 
ford-Binet criterion chronological age is, however, about as accurate 
as Miller, Illinois, Pintner, or a combination of Dearborn C and 
Pintner. 

7. One-fourth of the pairs of tests have multiple correlations 
(chronological age constant) with first criterion from .901 to .975. 
One-fourth of them have multiple correlations with Stanford-Binet 
criterion, chronological age again constant, from .725 to .800. Otis 
by itself correlates more closely with first criterion (chronological age 
constant) than three-fourths of the pairs of tests, and with third 
criterion more closely than nearly one half of the pairs of tests. 

8. Our data indicate that the tests correlating most closely with 
criterion rank high but not highest in accuracy of sectioning, but that 
they are not necessarily the most accurate ones in measuring a pupil’s 
IQ and the same thing is true of pairs of tests. On the other hand, 
the tests having high accuracy as measures of a pupil’s IQ usually 
rank high in accuracy of sectioning. 

9. Further investigation covering a wider range of tests is needed 
to determine the‘accuracy of those appropriate to other grade levels, 
such as first, fourth, and probably twelfth or first-year college, and to 
discover the increase in precision that results from using two group 
intelligence tests instead of one. 











the 
rel 
bil 


the 
int 
pre 


thi 






























AN EMPIRICAL STUDY OF THE VALIDITY OF THE 
SPEARMAN-BROWN FORMULA AS APPLIED TO 
THE PURDUE RATING SCALE 


H. H. REMMERS, N. W. SHOCK, AND E. L. KELLY * 


Purdue University 


HistroricaL SUMMARY 


When the same test is given at different times to the same persons, 
the correlation between the resulting scores gives a measure of the 
reliability of the test used. It is a well known fact that the relia- 
bility of a test is increased as its length is increased, but for some time 
there was no way of foretelling just how much the reliability would be 
increased in its length. There was need for a formula that would 
predict the reliability of a test n times as long as a given test. In 
1911 William Brown published a formula which promised to meet 
this need. It was found to be only a special case of an earlier formula 
of Spearman’s for finding the correlation between the sums or averages 
of scores.?, This revised formula is now in current use and is known 
as the Spearman-Brown prediction formula. If rig is the coefficient 
of reliability of a test, then r,, the coefficient of reliability of a test n 
times as long, is given by the formula:* 


= NP i2 
I= Dra 

Since r is always less than 1, r, will always be greater than rj2, and 
thus the reliability of the longer test will increase as some logarithmic 
function in its length. In order for this formula to be valid, it must be 
assumed that the test material is homogeneous in character, or statis- 
tically speaking, that the intercorrelation between the units is equal 
and their standard deviations are equal. Were these assumptions 
strictly fulfilled, perfect prediction would result. In practice, however, 
such ideal conditions are never found, so our problem becomes that of 
determining just how accurate a prediction may be obtained for the 
material at hand. 

It was only natural that the introduction of such a formula would 
be of much interest to all those interested in tests and measurements. 
Accordingly, studies soon began to determine the validity of the for- 





* Responsibility for this study is divided as follows: Shock and Kelly collected 
the data and carried out all statistical computations under the direction of Rem- 
mers, who suggested the problem and the experimental and statistical procedure. 

187 


Oe a a 
ee ee en ee 
a 


Cs ED en elgg nye 


oo es 


Ls NR 


te Oe age Er BT ae eee 
p= ae nue mr 


— 
ee ee 


<i eg I 


— > 








188 The Journal of Educational Psychology 


mula for various classes of material. Holzinger,‘ in studying the 
application of the formula to intelligence test material, found that the 
formula tended to predict the obtained reliability with fair accuracy for 
a pool of five tests or less, but for greater numbers the formula over- 
estimated the empirical results obtained. His test material, however, 
violated to some extent the major assumption, 7.e., homogeneity of 
test items. Later, Holzinger and Clayton® applied the formula to 
spelling words and self-administered test material, and in each case 
found a close correspondence between the predicted and observed 
results. Kelley,® using Gordon’s data on lifted weights, also found a 
close agreement between the predicted and observed results on this 
material. Ruch and others’ applied the formula to spelling words, 
and they came to the conclusion that it gave a meaningful prediction 
for such test material. In two cases out of five, the tendency was to 
overpredict slightly the empirical scores. Wood® applied the same 
formula to a set of scores on a hundred true-false questions given to 
language students, and he found that it tended very nearly to predict 
or else to underestimate the observed results. Furfey® published 
results while the present study was in progress, showing that the for- 
mula slightly overestimated the reliability obtained by dividing the 
test into a number of components. 


Purpose oF Tuis Stupy 


It is the purpose of this study to determine the validity of the 
Spearman-Brown prediction formula when applied to the personal 
rating scale now in use at Purdue University. If it is found to bea 
valid means of predicting the reliability of the combined ratings of 
any number of judges, it will be quite useful to the personnel depart- 
ment in deciding on the number of judgments that should be obtained 
on an individual in order to obtain the desired reliability. At the 
present time, each engineering student is rated by five fellow students 
and by five instructors, all of his own choice. Thus the number of 
judgments has been arbitrarily placed at ten, without knowing 
whether it is too low to be dependable or too high to be economical. 
If it is shown that the Spearman-Brown formula will accurately predict 
the reliability for any given number of judgments with the scale, then 
the most efficient number of judgments, both for dependability and for 
economy, may be accurately determined. 

In order that the reader may better understand the present study, 
a short description of the rating scale, which is now being used, is 











tic 
qu 
ap 








wr wr we SS SS i rE Dd ’ ree Ure ON 


\y ' —-r Ww 


Le ee | ee al 


Spearman-Brown Formula and Purdue Scale 189 


given. It consists of a printed form containing a paragraph of instruc- 
tions on the upper part of the sheet, which explains the purpose of the 
questionnaire and asks the reader to rate the individual whose name 
appears filled in the blank. The scale itself consists of ten traits on 
which the ratings are made. These traits are: 
Address and Manner 
Attitude 
Character 
Cooperative Ability 
Disposition 
Industry 
Judgment 
Initiative 
Leadership 
10. Mental Caliber 
The ratings are made on a five point scale, five being considered 
perfect, one as very poor, and three as representing the average indi- 
vidual. Fractional judgments are permitted. It will thus be seen 
that fifty is the highest possible total score. 
Plice in a recent study of the scale’® came to the following con- 
clusions: 
1. “Students rate each other higher than teachers rate them. 
2. ‘Teachers agree better on the rating of students than do the 
students themselves. 
3. ‘Teachers agreed fairly well with the students except on traits 
7 and 10. (Trait 7 is Judgment and Trait 10 is Mental Caliber.) 
4. ‘‘While the general agreement of teachers and students in rating 
is low, it is somewhat higher than chance agreement.” 


COnNOnroep> 


EXPERIMENTAL PROCEDURE 


If the Spearman-Brown prediction formula is to be used in predict- 
ing the reliability of a certain number of judgments, certain assump- 
tions must be made. In the first place it is assumed that one judge 
is just as competent to judge traits of character and personality in an 
individual as another. This follows from the basic assumption that 
the test material is homogeneous, as is made in the original formula, 
since here instead of having test elements we have judges. Another 
assumption that follows directly from this is that an individual judge 
can be treated as a constant factor, comparable to a test element. 


. > ie edhe 























ae oe 
ie e- 


PO ET ee ae OP er a aa ae . 
= Se > - ae 
- AS “ 





~ - — 


—eeaeenaense ee see 









190 The Journal of Educational Psychology 


It was felt that we should have a group as large as possible with 
not less than twenty-five who would judge each other conscientiously 
and who also knew one another well enough to be fairly accurate in 
their judgments. Hence, a fraternal group was chosen. Here several 
factors entered which are possibly not the same as those encountered 
in the actual field of personnel rating. In the first place all of these 
men knew each other more intimately than would ordinarily be true 
of some ratings obtained in practice, since they had lived in the same 
house with one another for almost an entire semester. This fact 
would tend to raise the reliability of the ratings. On the other hand, 
the fact that each man was rated by every other man in the house, let 
petty jealousy and feeling creep into the ratings in some instances. 
This would tend to lower the reliability of the scale. This difficulty 
is obviated in practice since the individual rated chooses his own 
judges, thus possibly eliminating those who would rate him low merely 
because of personal feeling. 

The first step was to get the men to rate each other. In order to 
facilitate the tabulation of the results each man in the fraternity was 
given a number as a judge and a letter as a subject to be judged. 
It was first thought best to have each man judge six others at a time, 
selecting each group by chance, in order to eliminate the factor of 
fatigue on the part of the judge. This was done, but with the first 
and second groups of six a decided change in standards on the part of 
the judge was noted. That is, the judges tended to compare and 
rank the men within each group of six, instead of judging the men 
within the fraternity as a whole. In order to avoid this difficulty, 
all of the remaining rating blanks were given out at the same time, thus 
minimizing the possibility of errors due to the shifting of the standards 
used by the individual in making his judgments. 

The instructions given to the men were: “In order to carry out an 
investigation of the reliability of personal judgments, we are asking 
every man in the fraternity to rate every other man, using the Purdue 
Personnel Rating Scale. The thing that we want is a conscientious 
rating of the character and personality of each man, using the average 
man within the fraternity as a standard. The rest of the instructions 
will be found on the rating scale. You may be perfectly frank in 
your judgments, since all ratings will be anonymous.” It was thus 
made clear to the men that each was to use his own conception of 
the average man within the fraternity as a standard by which he was 
to judge the others. 








ta at _> 4 























Spearman-Brown Formula and Purdue Scale 191 


A fine spirit and keen interest was displayed by everyone. This ) 
was due to the fact that all were much interested in finding out what 
the rest of their fraternity brothers thought of them. This was possi- 
ble only if each gave a good rating of the rest. As a result, the ratings 
came in promptly and were rather carefully filled out. It is reasonably 
certain that each rating was as carefully analyzed as is usually true in 
ratings made in actual practice. 

After all the rating scales were received properly filled out, the 
individual’s score was computed by taking the sum of the rating on 
each of the ten characteristics. After these data had been computed 
they were arranged in tabular form, subjects judged on the z axis, 
and judges on the y axis. The arithmetic mean of all the judgments 
on each subject were computed. The standard deviations (c) of the 
distribution as well as probable errors (PE) of the means then com- 
puted and arranged in tabular form as found in Table I. All tables 
not shown in this article because of limitations of space are on file F 

) in the Department of Education, Purdue University. i ¥ 





| 
: 


ee we 


TaBLeE I i 

Letters Indicate Subject. Average is Average of All Judgments on That Subject. f 
o is the Standard Deviation of the Distribution. PEg,,. is the Probable Error a 
of the Average 4 
































Subject | Average Distr. PEw. Subject | Average Distr. PE... 
rated o | rated a 

A 29.0 4.31 + .58 N 33.4 5.20 +.70 

B 31.1 4.97 + .67 Oo 34.8 4.90 + .66 

C 30.8 5.42 +.72 P 33.8 4.77 + .64 

D 35.8 4.90 + .66 Q 31.8 5.28 +.71 

E 37.9 4.84 + .65 R 35.3 4.36 + .59 

F 38.1 4.55 +.61 || S 30.9 3.97 + .52 

G 36.3 4.06 +.55 || T 38.3 4.60 + .62 

H 33.1 3.86 + .52 | U 33.8 4.97 + .67 

i 31.0 3.75 +.51 || V 36.8 5.09 + .69 

J 36.9 | 4.74 + .64 | W 35.9 4.68 + .63 

K 37.0 5.16 +.69 || X 34.1 5.24 +.71 

L 31.0 4.35 +.59 | Y 33.8 4.62 + .62 

} M 35.0 5.10 + .69 || Z 31.9 4.03 + .55 








Averages of judgments varying in number from one to thirteen 
and randomly selected, were then correlated. That is, correlations of 
one judgment vs. one judgment, two judgments vs. two judgments and 
so on to thirteen judgments vs. thirteen judgments were obtained. 


7 
_— 


ae es Gg § eT OT Ce — 
ry ~ en > sx 
; : . re 4 









192 The Journal of Educational Psychology 


Column 2 of Table II gives the number of chance combinations corre- 
lated for the different groupings. A total of 96 r’s was computed. 

Table II is a summary of the results of the entire investigation, 
since it shows the predicted and experimental reliability of the scale 
when used with given numbers of judgments. 


TaBLE I].—CoRRELATIONS PREDICTED AND OBSERVED 
































Differ- 
Pre- PE Ob- PE ence 
Number| dicted : served | average jobserved 
r r r — pre- 
dicted 
Ns eC pawandn 25 .251 + .040 
a i ie a 23 .401 +.111 .388 + .020 | —.013 
Rk ns wg ewe Ss .500 + .099 592 + .043 | + .092 
Fours 6 .572 + .087 .538 +.041 | —.034 
gh s 58 66 «clweie 10 .626 + .080 .631 + .046 | — .005 
ogi au ee eed 6 .677 + .070 .661 +.013 | —.016 
Sevens 5 .700 + .067 .777 +.003 | +.077 
EEE aoe: 3 .728 | +.062 || .667 | +.040| —.061 
ERS Ee 2 .751 + .058 .663 + .027 | —.088 
eine ar gt tues nia 2 770 + .051 i 731 +.015 | —.039 
SP ee 2 .786 + .050 | .834 + .041 | + .048 
NS sy a Ae Ral 2 .801 +.048 | .865 + .034 | +.064 
eee 2 813 | +.046 | 852 | +.044  +.039 





The first column of numerals is the number of correlations in the 
average. 

Predicted r is reliability predicted by Spearman-Brown formula. 

Observed r is reliability obtained by experiment. 

Inspection of this table reveals certain interesting trends in the 
data: (1) A relatively rapid increase in reliability in the first few groups 
of correlated judgments. (2) A very close approximation to the pre- 
dicted value when the number of correlations is relatively large. (3) 
The last three values in the observed series are higher than those in 
the predicted series. This, however, may be merely a chance fluctua- 
tion due to limited sampling. 

A graphical representation of these results is given in Fig. 1. It 
will be seen that the experimental value for the reliability falls outside 
of one probable error of the predicted value only three times, and 
that it is always within one standard deviation. One reason for the 
irregularity of the curve of the experimental reliabilities is the relatively 





Spearman-Brown Formula and Purdue Scale 











194 The Journal of Educational Psychology 


small number of correlations which were possible in the groups con- 
taining the higher numbers. 

Figure 2 is made from the same data, but with the curve of pre- 
dicted values represented by a straight line in order to show that the 
probable error of the correlations decreases as the coefficient of correla- 
tion increases. This fact is not evident from Fig. 1 since the PE is 
measured on the axes rather than on line perpendicular to the curve of 
predicted reliability. Holzinger and Clayton‘ in a graph showing the 
results of a similar study to test the validity of this formula when 
applied to spelling words did not seem to take this into account, but 
seemed to measure the probable error on a line perpendicular to the 
curve at the given point. This did not give the true location of the PE 
curve since all the distances must be measured on the abscissa or 
ordinate. 


SUMMARY 


A review of previous experimental work designed to test the validity 
of the Spearman-Brown prediction formula shows it to give a meaning- 
ful prediction on such materials as mental test items, spelling words, 
lifted weights, true-false test items in language, and component units 
of rating scales. 

The present investigation using judges as equivalent to test items 
in the sense required by the formula, concerned itself with correlations 
between groups of judgments varying from 1 to 13. These judgments 
were obtained by having each man in a fraternity group of 26 judge 
every other man on certain traits of character and personality as given 
in the Purdue Personnel Rating Scale. 


CoNCLUSIONS 


1. The reliability of the average of a number of judgments with 
this scale is higher than that of one judgment. 

2. The Spearman-Brown formula predicts within two probable 
errors or one standard deviation the empirical reliability obtained by 
experiment up to and including thirteen judges. 

3. Although the increase in reliability diminishes as the number of 
judgments is increased, the decrease in per cent of error of the average 
judgment continues to be appreciable. 

4. If the Spearman-Brown formula holds for numbers of judgments 
beyond thirteen, the number of judgments required for any desired 
reliability can be readily determined. The practical import of this 











sit 


1. 
2. 
3. 
4. 


qn 








Spearman-Brown Formula and Purdue Scale 195 


generalization is of considerable consequence in actual rating or judging 
situation. 


~ 9 pe = 


a 


10. 


11. 
. Ibid., page 331. 


REFERENCES 


Kelley, T. L.: “Statistical Method,” page 353. 

Ibid., page 205. 

Monroe, W. S.: ‘Theory of Educational Measurements,” page 205. 

Holzinger, K. L.: Notes on the Use of the Spearman Prophecy Formula for 
Reliability. Journal of Educational Psychology, Vol. XIV, page 302. 

Holzinger and Clayton: Further Experiments in the Application of Spear- 
man’s Prophecy Formula. Journal of Educational Psychology, Vol. XVI, 
page 289. 


. Kelley, T. L.: The Applicability of the Spearman-Brown Formula for the 


Measurement of Reliability. Journal of Educational Psychology, Vol. XV, 
page 300. 


. Ruch, Ackerson and Jackson: An Empirical Study of the Spearman-Brown 


Formula as Applied to Educational Test Materials. Journal of Educa- 
tional Psychology, Vol. XVII, page 319. 

Wood, Ben: Studies of Achievement Tests. Journal of Educational Psy- 
chology, Vol. XVII, page 263. 

Furfey, Paul: An Improved Rating Scale. Journal of Educational Psy- 
chology, Vol. XVII, page 45. 

Plice, Max Jennings: A Study of Some of the Psychological Aspects of 
Personnel Selection with Special Reference to Purdue University, page 50. 
(To be published in an early number of the Journal of Industrial Psychology.) 

Monroe, W. S.: ‘‘Theory of Educational Measurements,” page 325. 
































NOTES ON ARTICLES IN EDUCATIONAL 
PSYCHOLOGY IN CURRENT ISSUES OF 


@e-~ OTHER MAGAZINES SB 


REPORTED BY C. 0. MATHEWS 
Research Associate, Institute of Educational Research, Teachers College 
Columbia University 











MENTAL MEASUREMENT 


A Study of the Relation between Ability and Achievement. John R. McCrory. 
Educational Administration and Supervision, Oct., 1926, 481-490. This study 
of students in a teacher-training institution suggests the use of a combination of 
intelligence test scores and high school marks for predicting success in college. 
The author thinks that other measures of character traits would increase the 
accuracy of prediction. 

Mental Differences in Children Referred to a Psychological Clinic. The Journal 
of Applied Psychology, Dec., 1926, 470-486. The author reports on the mental 
differences discovered in 467 clinic cases in relation to their racial, social and 
economic and educational status. 

The Influence of Practice in Mental Tests. C. 8S. Slocombe. The Forum of 
Education, Nov., 1926, 173-179. The influence of practice within a test as well 
as the influence of an entire test upon subsequent tests is reported. 

Classification of Freshmen at Brown University. A. H. MacPhail. Journal 
of Educational Research, Dec., 1926, 365-369. The relationship between student’s 
scores in psychology and English test and their class grades is shown. 

The Development of Early Behavior Patterns in Young Children. Mary Cover 
Jones. The Pedagogical Seminary and Journal of Genetic Psychology, Dec., 
1926, 537-585. Ten reactions were systematically studied using as subjects 365 
infants in New York Baby Welfare Stations. Age norms and curves of develop- 
ment are given. 

An Intelligence Test for Stenographers. Sadie Myers Shellow. The Journal 
of Personnel Research, Dec., 1926, 306-308. The method of constructing the 
test and validating it is given with data showing the relationship of scores on it to 
scores on a trade test. 

The Constancy of the IQ of Mental Defectives. Blanche M. Minogue. Mental 
Hygiene, Oct., 1926, 751-758. Seventy-two per cent of the cases showed no real 
change in IQ. There appeared to be a tendency for males to vary more than 
females and the variation was a loss more often than a gain. 


EDUCATIONAL MEASUREMENT 


A Series of Tests for the Measurement and Diagnosis of Reading Ability in Grades 
3to8. Arthur I. Gates. Teachers College Record, Sept., 1926, 1-23. This is a 
description of the four types of tests in the series with definite suggestions for their 
practical use in diagnosis. 


196 








ar 











ly 
of 


1e 
al 


al 
d 


& 


ir 


Articles in Other Magazines 197 


The Gates Primary Reading Tests; Their Uses in Measurement, Diagnosis, and 
Remedial Instruction. Arthur I. Gates. Teachers College Record, Oct., 1926, 
146-178. A series of tests useful in measuring and diagnosing three phases of 
reading ability in children of the first two grades is described. 

Reliability of the Thorndike-McCall Reading Scale. 8. C. Garrison and M. 8. 
Robertson. Peabody Journal of Education, Nov., 1926, 162-164. Forms 1 to 8 
of this test were given at intervals of one month to an eighth grade class of 42 
pupils in a demonstration school. 

The Achievement of College Students in Freshmen Rhetoric. M.J. Van Wagenen. 
Educational Administration and Supervision, Dec., 1926, 603-617. The results 
are based upon the scores of several hundred compositions each rated on thought 
content, structure, and mechanical perfection. 

The Administration of Placement Examinations. T. A. Langlie. School and 
Society, Nov. 13, 1926, 619-620. Data seem to show that training tests of 
Mathematics and Chemistry were more effective as placement tests when they 
were given after two weeks of class work rather than before classes begun. 

New Type Examinations in the College of Physicians and Surgeons. Ben. D. 
Woods. The Journal of Personnel Research, Oct., 1926, 227-234 and Nov., 1926, 
277-283. The validity and reliability of these tests as well as evidence as to their 
superiority over the essay type are given. 

Tests and Measurements in Physical Education at Western Reserve University, 
Cleveland, Ohio. Thomas Neil. American Physical Education Review, Oct., 
1926, 1018-1024. ‘This is an attempt to measure the development of a student’s 
mental ability to concentrate and respond accurately to complex directions, and to 
determine the relationship of this development to various types of instruction in 
physical education.” 

Factors in the Achievement of College Freshmen. Robert M. Bear. School and 
Society, Dec. 25, 1926, 802-804. The various factors studied in relation to 
achievement were fathers’ occupation, vocational purpose, participation in 
athletics, age and geographical location of home. 

The Available Tests for Results of Teaching the Sciences. 8S. G. Rich. School 
Science and Mathematics, Nov., 1926, 845-852. A list of available standardized 
tests of science is given with suggested standards by which to judge their usefulness. 

Norms of Scientific Ability and Achievement in the High School. E.R. Downing. 
School Sciences and Mathematics, Oct., 1926, 717-721. The construction of a 
test in which one of five words is to be crossed out is described together with the 
results of the test in classes of six Chicago high schools and two Toledo high 
schools. 

The Selective Value of Power’s General Chemical Test, Scale ‘“‘A.”” Lawrence 
Edward Stout. Journal of Chemical Education, Oct., 1926, 1138-1143. In the 
case of 65 pupils with one year of high school chemistry the Powers’ test gave 
better predictions of achievement at the beginning of the course (r = .661) than 
at the end of thirty six weeks (r = .491). 

The Continuity Test in History-teaching. Howard E. Wilson. The School 
Review, Nov., 1926, 679-684. The author contributes a method for scoring 
tests of the type that contain items to be numbered in order of occurrence so as 
to measure finer differences than previously suggested methods. 






























ee pe le seep ee te 





——~ 
> men 


ro oe! 


re Oe 





198 The Journal of Educational Psychology 


A Suggestion as to Correcting Guessing in Examinations. Arnold M. Christ- 
ensen. Journal of Educational Research, Dec., 1926, 370-374. The plan sug- 
gested provides for giving a true-false and a multiple choice test of ‘‘identical”’ 
items and letting the final score be the number of pairs of corresponding items 
correctly marked. 

The Construction of a Test to Measure Mathematical Ability. C. A. Stone. 
School Science and Mathematics, Nov., 1926, 824-831. The procedure in 
constructing a test designed to measure the pupil’s ability acquired through 
mathematic training is related. 

Psychological Tests of Mathematical Ability. Leslie Fouracre. The Forum of 
Education, Nov., 1926, 201-205. This is an attempt to test the mathematical 
ability of children distinct from mathematical achievement and mental ability. 

Testing Mechanical Ability by the MacQuarrie Test. S. D. Horning and R. S. 
Leonard. The Industrial Arts Magazine, Oct., 1926, 348-350. A study is 
reported of the relationships discovered with high school boys between results 
on the Terman Group Test, the MacQuarrie Test for Mechanical Ability and a 
rather complicated mechanical construction. 

The Influence of Sex on Scholarship Ratings. Donald G. Patterson and T. A. 
Langlie. Educational Administration and Supervision, Oct., 1926, 458-468. 
Evidence shows that girls rank higher than boys on scholarship ratings when 
these are based on older types of examinations and ratings but that the differences 
between the sexes vanish when scholarship is based upon more objective data. 

College Grades. Esther Allen Gaw. School and Society, Nov. 20, 1926, 
648-651. A plan of reducing letter grades to sigma indices is suggested in order 
to make grades more comparable. 


PsycHOLoGY or LEARNING 


The Influence of Context upon Learning and Recall. Shuh Pan. Journal of 
Experimental Psychology, Dec., 1926, 468-491. Pairs of words were memorized 
in various contextual materials and out of context and the influence of these 
environmental factors upon learning and recall was determined. 

Mental Fatigue of Indians of Nomadic and Sedentary Tribes. Thomas R. 
Garth. The Journal of Applied Psychology, Dec., 1926, 437-452. The nomadic 
tribes proved to be able to resist mental fatigue to a greater extent that the seden- 
tary tribes, both groups being composed of full-blood subjects. 

Effect of Serial Position upon Memorization. Edward 8. Robinson and Martha 
A. Brown. The American Journal of Psychology, Oct., 1926, 538-552. In this 
study primacy is shown to be more advantageous than finality of serial position. 
The relationship between position in series and speed of learning is demonstrated. 

A Qualitative (Experimental and Theoretical) Study of Productive Thinking 
(Solving of Comprehensible Problems). Karl Duncker. The Pedagogical Seminary 
and Journal of Genetic Psychology, Dec., 1926, 642-708. This is a critical 
view of the theories of thinking with the statement and demonstration of what 
the author considers a more nearly correct theory. 


PsycHo.Loaey or ScHooL SuBJECTS 


Specific Elements Making for Proficiency in Silent Reading, When General Intelli- 
gence is Constant. Luella C. Pressey. School and Society, Nov. 6, 1926, 589-592. 








—s of DD me 











Articles in Other Magazines 199 


The influence of specific elements upon proficiency was determined by testing 
56 poor and 56 good readers paired for intelligence, chronological age and sex. 

Children’s Interest in Poetry. Miriam Blanton Huber. Teachers College 
Record, Oct., 1926, 93-104. More than 50,000 children and expert teachers 
indicated interests in 573 carefully selected poems. A list of the more important 
poems is given by grades. 

The Attitude of Junior High School Pupils toward English Usage. Edith E. 
Shepherd. The School Review, Oct., 1926, 574-586. This is a study of pupils’ 
attitudes based upon evidence showing the correlation of accuracy in written 
work in other subjects with accuracy in English. 

Proficiency in Outlining. Carter V. Good. The English Journal, Dec., 
1926, 737-742. The extensive treatment of a topic is shown to cause the main 
and subheads to stand out more clearly in the reader’s mind than the condensed 
treatment of the same topic. 

An Experiment with the Horn-Ashbaugh Speller. C. A. Lentz. Chicago 
Schools Journal, Dec., 1926, 135-146. The value of supplying pupils with tests, 
the importance of the Horn-Ashbaugh method and a comparison between this 
method and the Chicago system were the three problems attacked. 

Solving Arithmetic Problems II. Carleton W. Washburne and Raymond 
Osborne. The Elementary School Journal, Dec., 1926, 296-304. This is an 
experimental study of three methods of solving arithmetic problems in which 
results are based upon records of 763 children in eighteen different schools. 

A Study of Percentage in Grade VIII A. H. W. Kircher. The Elementary 
School Journal, Dec., 1926, 281-289. The results show that a great preponder- 
ance of the 133 children tested had a very confused knowledge of percentage. 

Diagnosis and Training in Advanced High School Algebra. R. W. Yingling. 
School Science and Mathematics, Oct., 1926, 729-734. A number of pupils 
low in algebra were aided in increasing their ability by being segregated into 
groups and given specialized training. The Holtz Algebra Scales were used 
throughout the experiment. 

Basic Concepts in Contemporary Life. H. Meltzer. Journal of Educational 
Research, Dec., 1926, 356-364. This is a count study of the basic concepts con- 
tained in critical magazines and books in the field of social studies. A list of 297 
concepts arranged in rank order is included. 

Print-script and Cursive-script in Schools: An Investigation in Nervo-muscular 
Re-adjustments II. W.H. Winch. The Forum of Education, Nov., 1926, 206- 
222. This is the second part of six carefully controlled experiments to determine 
the relative facility with which children trained with one type of writing can adopt 
another kind. 


CHARACTER AND PERSONALITY 


Ratings of Downey Will-temperament Traits. George D. Stoddard and G. M. 
Ruch. The Journal of Applied Psychology, Dec., 1926, 421-426. The relation- 
ship between ratings on the Will-temperament Test and ratings of these same 
traits by friends was studied. Individuals were unable to recognize their own 
profiles and those of their roommates among five profiles. 

The Relation between Personality and Character Traits and Intelligence. Mary 
Goodyear Earle. The Journal of Applied Psychology, Dec., 1926, 453-461. The 








nd 


a 


eS a ee ee 






























et 


— 
ee PY . wae oe 


Sweeny ey 


EE OT ee RI REET ROE MEe SIS - | 








200 The Journal of Educational Psychology 


average relationship between Army Alpha scores and ratings on nine traits con 
tributing to the ideal trained nurse are insignificant. 

Some Factors Involved in Judging Personal Characteristics. Roy M. Dorens. 
The Journal of Applied Psychology, Dec., 1926, 502-518. The factor of time is 
studied by the use of a contact pencil. The differences in time required to judge 
various traits as well as differences in time required to judge the same traits in 
friends and in selves are pointed out. 

Reliability of Average Ratings. Arthur W. Kornhauser. The Journal of 
Personnel Research, Dec., 1926, 309-317. The graphic rating scale with seven 
traits is used on two groups of college students and reliabilities of averages found. 

The Effect of Suggestion on the Judgment of Facial Expressions of Emotion. 
Ellen Jarden and Samuel W. Fernberger. The American Journal of Psychology, 
Oct., 1926, 565-570. In general suggestion greatly increased the degree of. per- 
centage of recognition of facial expressions of emotion. 

Interest as an Indication of Ability. Richard 8. Uhrbrock. The Journal of 
Applied Psychology, Dec., 1926, 487-501. Two hundred fifty-three male college 
students were asked to state whether their primary interest was in things, people 
orideas. Relationships were then found between these replies and intelligence and 
mechanical aptitude scores. 

The Negro Child’s Index of More Social Participation. Harvey C. Lehman and 
Paul A. Witty. The Journal of Applied Psychology, Dec., 1926, 462-469. It 
is shown that negro children between 844 and 154 years of age consistently engage 
in more social forms of play than do white children. 

Are There Cases in Which Lies are Necessary? B. E. Tudor-Hart. The 
Pedagogical Seminary and Journal of Genetic Psychology, Dec., 1926, 586-641. 
Children’s opinion were collected on this question both in Vienna and in the 
United States and an analysis made of the replies. 

A Study of Dishonesty among the Students of a Parochial Secondary School. 
Edward V. Chambers. The Pedagogical Seminary and Journal of Genetic 
Psychology, Dec., 1926, 717-728. Students were asked to give their opinions on 
each of a series of stories in which dishonesty is involved, to answer a questionnaire 
and to take tests so arranged that cheating could be detected. 


MISCELLANEOUS 


Some Findings in Reference ‘‘to the Gang Instinct.” Harvey C. Lehman and 
Paul A. Witty. The High School Quarterly, Oct., 1926, 15-21. The tendency 
to be with the “gang” seems to be not a tendency which rapidly increases or 
decreases at certain ages in the child’s life. The data of this study show that 
girls under 13% years are with the gang less than boys but after this age the 
situation is reversed. 

Opinions of College Students. Edward S. Jones. The Journal of Applied 
Psychology, Dec., 1926, 427-436. The degree of conviction of 418 college men and 
women on a number of political, social and religious questions is reported. 

Prevalence of Certain Popular Misconceptions. H. E. Garrett and T. 'R. 
Fisher. Journal of Applied Psychology, Dec., 1926, 411-420. The frequency with 
which forty misconceptions are held by 140 senior high school girls and 115 senior 
high school boys was determined by administering a true-false test. 


Poe ARR Ha SAI 


Se Cima i A eee 

















Articles in Other Magazines 201 


2 A Case of Special Ability with below Average Intelligence. June E. Downey. 
The Journal of Applied Psychology, Dec., 1926, 519-521. Exceptional visual 
memory for figures was found in an individual of mediocre comprehension. 

The Principle of the Nomograph in Education. John C. Almack and William 
G. Carr. Journal of Educational Research, Dec., 1926, 340-355. The principles 
underlying the construction of the nomograph and examples of their application to 
educational data are given. 


eee FS ee 


el 


7 OAT LANL BRA IS: MRAM oak eae TS 


ee eee 







5 EB ct enna cap gn 


A ee ek ee Ee Set ~~ 
~ wh - 
a . as ~ 


—-—~ 5 
he 


i A a ie 


— 
~*~ 











NEW PUBLICATIONS IN EDUCATIONAL 
PSYCHOLOGY AND RELATED FIELDS OF 


tani EDUCATION ee 


CONDUCTED BY JOHN HOCKETT 
The Lincoln School of Teachers College 











Post Mortem MENTAL TESTS FOR EMINENT MEN 


Genetic Studies of Genius, Vol. II, The Early Mental Traits of Three 
Hundred Geniuses, by Catharine Morris Cox, assisted by Lela O. 
Gillan, Ruth Haines Livesay, Lewis M. Terman. Stanford 
University Press, 1926. Pp. XXIII + 842. $5.00. 


This volume offers a massive, almost formidable, expression of Pro- 
fessor Terman’s “life long interest in the psychological aspects of 
biography, particularly in the precocious indications of superior mental 
ability.” 

Beginning in 1922, Dr. Cox selected from Cattell’s objectively 
determined list of eminent men, a group of 282 (Group A) obtained by 
eliminating those born before 1450, those whose eminence might have 
been due to position in the hereditary aristocracy and nobility, those 
below the 510th in rank of eminence, and eleven others for whom no 
records were available. A preliminary study was made upon a group 
(Group B) of one hundred cases selected in similar fashion but below 
the 510th in rank of eminence. The reliability and validity of historio- 
metric procedures were considered as demonstrated in such earlier 
work as that of Woods and Stern, of Candolle, Odin, Ellis, Clarke, 
and Cattell. Leading biographies in English, French and German 
were examined, and all statements of fact, quoted letters, poems, 
essays, and other material which could throw light on the problem 
of achievement during the first twenty-five years of life, were recorded. 

This resulted in an average of about twenty typewritten pages for 
each person. 

These data were rated on a seven-point scale from “1,” signifying 
“equal to Stanford-Binet as a basis for determining mental ability,” 
to “7” signifying ‘‘ mere guess, based on no data.”’ Ona similar scale, 
fractional parts of a complete Stanford-Binet were estimated and the 
reliability of each approximated by the Spearman-Brown prophecy 
formula. Such an ingenious procedure made it possible to express 

202 











per 
on 


cat 
giv 
ack 
Me 
col 
ser 
in 

rec 


thi 
git 


in 


th 











—_— — ' 


mae ' =— aS lS Tl lUlULDCCOhlClUC CUCU 


~~ 


we —— \y 


New Publications 203 


the reliability of the data presented for each individual. The relia- 
bilities range from .00 to .82 with a mean of .46 for the data on the 
period from birth to age 17 and a mean reliability of .53 for the data 
on the later period, 18-26. 

The data were then examined by three or more persons, each indi- 
cating the ratio between the age at which the genius accomplished a 
given feat, and the age at which present knowledge of the intellectual 
achievements of normal children indicates that the average child 
would accomplish similar tasks. The ratings of Terman, Cox, and 
Merrill were finally used for all cases. These showed an average inter- 
correlation of .73. Statistical investigation showed that differences 
separated by five points were certainly real differences, except perhaps 
in the interval IQ 170 to 175. Difficulties appeared in comparing 
records of childhood with records of young manhood, so two IQ’s 
were determined for each individual. The AI IQ covers the period 
through age 17, the AII IQ covers the period from 18 to 26. 

The following abbreviated discussion of the reasons for the rating 
given to two men (Herder and Prescott) who appeared near the aver- 
age of the group both in IQ and in the adequacy of the data, will 
indicate the process used, and the type of reading matter which crams 
the interesting Chapters VI to X. 

“The standing of Herder’s parents allows the son a guessed IQ 
of not less than 110. The community in which young Herder lived 
and the school which he attended rank him somewhere near the same 
figure. The personal characterization, school standing, and intellec- 
tual interests raise the rating to 130. Herder ‘seemed different from 
other children. He spent much of his time alone and was always 
grave and serious.’ The fact that a fistula in his eye caused him pain 
throughout the period of his childhood increases the significance of 
his youthful achievements. Herder’s ‘thirst for knowledge, his pleas- 
ing manners, and his rapid progress’ made him the favorite of his exact- 
ing and thorough teacher, an ardent scholar. The latter selected the 
boy from his class as suitable material for the university and gave him 
private instruction in Greek and Hebrew. The boy’s passion for 
reading became so great that he had to be forbidden the use of books 
at his meals. If he knew of any book in the village he borrowed and 
read it. In less circumscribed surroundings this description might 
easily fit an IQ of 140. The rating of 130 is not too high. 

“Prescott is entitled by parental standing to a guessed IQ of 120. 
Attendance at a school of exceptional standing suggests a rating of 


ee ee ed ee ae — . 



























204 The Journal of Educational Psychology 


125. Prescott’s ability and many of his characteristics—his ‘inquisi- 
tive mind, quick perception, and ready retentive memory’—are char- 
acteristics of a gifted youth. His superior performance in the college 
entrance examination bears out this interpretation. But the raters 
agree that the evidence does not warrant a rating above 130 because, 
although Prescott was a great reader, his reading was largely limited to 
fiction; he disliked mathematics and metaphysical discussions and 
speculations and, although he formed ‘good resolutions’ for study, 
he was careful never to exceed the number of hours scheduled or to 
do more than was required. The single written communication 
preserved from his hand in an available source and dated from his 
16th year has received from three raters an average composition IQ 
rating of 125. A rating of 130 is not too high for the child Prescott. 
A higher rating is probably not warranted by the data.” 

Treated in such fashion the IQ’s range from André Masséna, Duc 
de Rivoli and Prince D’Essling (AI IQ 100, AII IQ 105) to Leibnitz 
(AI IQ 185, AIT IQ 190) and John Stuart Mill (AI 1Q 190, AIT IQ 170). 
The records as given indicate an average IQ between 135 and 145 
for the entire group. The authors regard certain corrections as justi- 
fiable because the IQ’s regularly increase with increase in the relia- 
bility of the data. This may in part be due to the greater interest of 
biographers in the achievements of unusual children, but is interpreted 
as due primarily to the superiority of each of these persons to the aver- 
age member of groups in which these geniuses did, indeed, participate. 
Correcting for such errors, the authors believe the true average of the 
group to be not lower than 155 to 165 IQ. A correlation of .25 was 
found between IQ and rank order of eminence, which remains .16 when 
reliability of data is held constant. The authors are inclined to make 
much of this relationship, although it amounts to a difference of only 
4 points (AI IQ) or 12 points (AII IQ) between the first ten in eminence 
and the last ten in eminence. 

Of keen interest to the reviewer was the attempt, described in 
Chapters XI and XII to rate the character traits of a representative 
portion of Group A (Group C). Unfortunately, it seemed necessary 
to rate each genius in comparison with the “average person,’’ while no 
attempt was made to obtain a control description of the character 
traits of average persons. Aware of the errors into which Pearson was 
led when attempting to estimate the childhood intelligence of Galton 
without norms for comparison, they can hardly hope to have avoided 
similar errors in these ratings. The ratings covered environmental 








opp 
thos 
secc 
ano’ 
by : 
unif 
ity,’ 
and 
any 
that 
to i 
mar 
tion 
wer 
tact 
kine 


emi 
(as 

gen! 
mor 
tale 
leac 
sold 
stat 
che 
the 
latt 


any 
trai 
liab 
con 
able 
inte 
and 
hyg 
req 
noti 
eler 








me O ~ © DO 


_~ 
“~ 


— & ner O 


al 





New Publications 205 


opportunities, intellectual interests, and 67 character traits, largely 
those used in Webb’s study. The reliability measured from first to 
second rating by the same person was .78; between one rater and 
another it was.53. Theusefulness of these ratings issomewhat obscured 
by an unjustified attempt to group the traits into such theoretically 
unified types as “emotional,” “social,” “self,” “intellectual,” “activ- 
ity,’ and combinations of these. However, the profiles of groups, 
and Table 49 (opposite page 212) afford a happy hunting ground for 
any of curious statistical bent. Making due allowance for the fact 
that the two competent ladies who did the rating would be hard put 
to it to define just what constitutes a normal or average small-boy 
manifestation of reserve, or imaginativeness, or courtesy, or conven- 
tionality, or pure-mindedness, and for the fact that the biographies 
were not only read but written by those disposed to find statesmen 
tactful, philosophers intellectual, artists temperamental and saints 
kindly, there appear interesting divergences within the data. 

The chief difference between the most intelligent and the most 
eminent is not the greater will-power and perseverance of the latter 
(as the author mistakenly suggests) but appears to be the duller 
geniuses’ greater loyalty to friends and causes, more pure-mindedness, 
more sense of corporate responsibility, and less esteem for their own 
talents. It appears that one might trust his pocketbook to a religious 
leader but never to an artist, tell his jokes to musicians but never to 
soldier-statesmen. For the worst extroverts seek these same soldier- 
statesmen, but for introverts choose soldier-fighters! (?) For good 
cheer keep company with statesmen or musicians, for kindliness choose 
the dull or the religious, but expect little modesty and reserve from the 
latter. 

If the comparison with the concept of the average child means 
anything, the geniuses as a group were above the average in all desirable 
traits except that they were more liable to extreme depression, more 
liable to anger, more eager for the admiration of the crowd and less 
conventional. The raters are inclined to find geniuses most remark- 
able for traits showing strength or force of character, activity and 
intellectual zeal, but no different from the average in balance 
and emotional characteristics. (Any representatives of the mental 
hygiene movement who believe that the liberation of greatness 
requires an unusual degree of emotional stability will please take 
notice!) “‘We may conclude that the following traits and trait 
elements appearing in childhood and youth are diagnostic of 



































Ss GL eee ee 


he et 


NN eer OD Oe EEE 





206 The Journal of Educational Psychology 


future achievement: an unusual degree of persistence, tendency 
not to be changeable, tenacity of purpose, perseverance in the face 
of obstacles, intellective energy, mental work bestowed on special 
interests, profoundness of apprehension, originality of ideas, and 
vigorous ambition expressed by the possession to the highest 
degree of desire to excel.” 

Part II, which offers over 500 pages of abbreviated biographies of 
the childhood and youth of the most eminent men of our civilization 
promises to serve any number of purposes, from those of the school 
child who must write an essay, to those of the research worker 
who wishes to mine further this wealth of information regarding the 
great and the near-great. 

Readers will share the editor’s hope that this study may be followed 
by other intensive studies of the assembled data, for light upon mental 
inheritance, interests, specialized abilities, environmental influences, 
physical illnesses, etc. With even deeper interest they will watch the 
consummation of the larger plan which involves not only this study 
of the childhood of the eminent, but also the continued observation of 
the most able children, searching for the conditions under which excep- 
tional intelligence bears the fruits of genius. 

Goopwin B. Watson. 
Teachers College, Columbia University. 


A BritisH MANUAL FOR LABORATORY EXPERIMENTS 


A First Laboratory Guide in Psychology, by Mary Collins and James 
Drever. New York: E. P. Dutton andCo., 1926. Pp. VIII + 108. 


This little book, which is intended to serve as a laboratory manual 
for beginning students of experimental psychology, was written by 
the authors to accompany their recent textbook in the same field. 
The experiments are for the most part of the conventional variety, 
and like the textook show the influence of Myers’ earlier books. 
There are fifty experiments in all, twenty of which are marked ‘‘Sup- 
plementary.”’ The arrangement of the experiments is good; twelve 
simple preliminary experiments pave the way for the more difficult 
ones to follow. 

Experiments range all the way from the old “standbys” on sensa- 
tion and psychophysics to the newer ones involving conditioned reflex 
methods, the psychogalvanic reflex, and the “G” factor. (Without 





the 


one 
beg 
out 
the 


cor 
un¢ 


deg 
abl 
im] 
diff 
ad 
the 
in 1 
bur 


and 
tim 
and 
littl 
and 
The 
mer 
nan 
brec 
seve 
the 

alm 


Tex 
bool 
that 
tant 
criti 








— — ( 


yf 


| 
ar 
1e 


1e8 


ual 


aid. 
ty, 
yks. 
up- 
alve 
cult 


nsa- 
flex 


New Publications 207 


the last named, no British text on psychology would be complete!) 
The authors state in their preface that the experiments are described 
in sufficient detail to permit students to perform them without the 
assistance of an instructor, and this is probably true of the simpler 
ones on sensation, attention, learning, and association. Whether 
beginning students—or even those fairly well advanced—could carry 
out the experiments on reaction time, or the conditioned response, from 
the very meager descriptions given in the manual, is at least doubtful. 
However, apart from the all too scant instructions given for these more 
complex experiments, the directions are generally clear and 
understandable. 

Like the textbook, this manual suffers (but to a somewhat lesser 
degree) from its insular and peculiarly British character. Presum- 
ably the authors are not familiar with a great deal of the recent and 
important experimental work done in this country for otherwise it is 
difficult to see how they could have managed to ignore it with so high 
a degree of success. More than 50 per cent of the references given in 
the Appendix and the footnotes of the textbook are to British sources; 
in the chapter on “Attention,” for instance, while James and Pills- 
bury are quoted, no reference is made to Geissler, Dallenbach, or 
Morgan; in the chapter on “‘ Action,” references are made to McDougall 
and to Titchener; Cattell, Henmon, and the mass of work on reaction 
time done by Cattell’s pupils is ignored. In the chapters on “ Imagery 
and Association,’ and “‘ Imagination and the Higher Thought Processes,”’ 
little use is made of Betts’ work (though he is given as a reference) 
and much is made of Galton and his time honored breakfast table. 
The work of Angell, Bentley, Fernberger, and Hollingworth on judg- 
ment and imagery is not mentioned; Woodworth is mentioned once by 
name, but nothing is said of the work of his pupils, Clark and Heid- 
breder, on thinking. A slight concession to modernity is made in 
several pages devoted to the Gestalt theory and in the page given to 
the conditioned response. All in all, however, these two books are 
almost exclusively British products. 

Among those who are looking for a simplified edition of Myers’ 
Textbook of Experimental Psychology, rather clearly written, these 
books will find favor. As they are intended for beginners, the fact 
that they omit much relevant material may not be considered impor- 
tant; for after all, beginners are not expected to have a wide or very 
critical knowledge of the field. Henry E. GARRETT. 
Columbia University. 











































— Se A GIO 


~2 ~ 
a 
-_—_—— 





208 The Journal of Educational Psychology 


PsycHOLOGY AND VOCATIONAL ADJUSTMENT 


Principles of Employment Psychology, by Harold E. Burtt. Boston: 
Houghton Mifflin Company, 1926. Pp. VII + 568. $3.00. 


This book is designed “‘to give a fairly comprehensive account of 
the principles involved (in Employment Psychology) for the use of 
students preparing for practical psychological work in industry,” 
and to “show the practical man the importance of painstaking scien- 
tific technique in employment psychology in contrast with the expedi- 
tious but unreliable methods of unscientific pseudo-psychology.”’ 
It contains adequate treatment of the topics that have become tradi- 
tional in books of this type: Phrenology and Character Analysis, 
Mental Tests, Rating Scales, Intelligence, Interests, Trade Tests, 
Job Analysis, etc. 

One admirable feature of the book is its profuse citation of con- 
crete cases. Instead of dry, technical descriptions of the various pro- 
cedures followed in Employment Psychology, the author illustrates 
with numerous examples chosen from the literature on the subject. 
Copious references are made to original investigations in a bibliog- 
raphy numbering 692 items. 

It is to be regretted that the author, a psychologist of repute, 
complacently subscribes to the popular terminology used by the com- 
mercial purveyors of tests. He even goes out of his way to defend 
the practise of designating this test as one for “Attention;” that 
one as a test for “‘Memory;” in spite of the fact that investigations 
of many years standing have conclusively demonstrated the fallacies 
implicit in such appellations. For a serious psychologist to perpetuate 
such erroneous beliefs even through the apparently harmless medium 
of terminology is certainly to retard the progress of truth. 

A critical statistician would also raise the brows to see the charac- 
terizations “fair” and “rather high” applied to indices of correlation 
represented by .40, .38, and .37, entirely innocent, as well, of Probable 
Errors. One wonders why so much emphasis is laid in the book on 
results obtained with the precarious and abused technique of correla- 
tion since most of the vocational researches in which it has been 
applied have used only a small number of cases, have not been suited 
to the commonly used “‘r”’ or ‘“‘R” because of the absence of a straight 
line of regression, and have only rarely mentioned the Probable Error 
of r when used. On the other hand one is pleased to note that Dr. 
Burtt dwells at some length on a more reliable tool for showing cor- 





res} 
nan 


scie 
tim 


aut! 
Ind 
mer 
aut 
teck 
stan 
tion 
may 
the 


AD 


ace . 
Hig] 
grad 
read 
spee 


seve 
from 
gene 
ness, 
desir 
score 
coeff 
mult 
odd 











New Publications 209 


respondence between test performance and vocational proficiency, 
namely the Probability Table. 

With the exception of the dubieties just mentioned, the book is 
scientifically sound. It is written in an interesting manner, some- 
times approaching the free and easy, and is packed as full of meat 
as 568 pages could be. 

In a final chapter on the Outlook for Employment Psychology the 
author gives a panoramic view showing great activity in this field. 
Indeed, the entire book with its numerous descriptions of actual experi- 
ments, gives an impression of intense activity. Unfortunately the 
author is frequently obliged to inject the warning that this or that 
technique is still in an experimental stage or has only theoretical 
standing at present. But the entire book depicts the field of voca- 
tional adjustment as one of fascinating possibilities through which 
may come potent contributions to the social welfare and, reflexly, to 
the science of psychology. Harry Dexter Kitson. 

Teachers College, Columbia University. 





CORRELATION INQUIRY INTO HIGH-SCHOOL ACHIEVEMENT 


A Detailed Analysis of Achievement in the High School, by Cecile White 
Fleming. New York: Teachers College, Columbia University, 
Contributions to Education, No. 196, 1925. Pp. XI + 209. 


The basic data of this study are the scores of 285 pupils in the Hor- 
ace Mann High School for Girls and of 268 pupils in the Horace Mann 
High School for Boys in school achievement (composite of school 
grades), six separate school subjects, three intelligence tests, one 
reading test, Briggs Form Test, a composition test, three Latin tests, 
speed of movement, freedom from inertia, speed of objective decision, 
assurance of verbal memory, coordination of impulses, volitional per- 
severance (the last six as measured by the Carnegie Test IX adapted 
from Downey), the Stanford Achievement Test, health, energy, intelli- 
gence, school attitude, emotional balance, leadership, conscientious- 
ness, industry, will and perseverance, prudence and forethought, 
desire to excel (the last eleven from teacher ratings). These sets of 
scores were put in the correlation mill to produce several hundred 
coefficients of the zero order, partials of the first and second orders, and 
multiple ratios. The presentation of these coefficients in some fifty 
odd tables, the explanation of the same, the account of statistical 









































me ere on 


ae eK Set ewe 
J 





210 The Journal of Educational Psychology 


procedure, the canvass of the literature of the field, the listing of 127 
references, and the summary of findings constitute the book. 

The importance of these innumerable correlation data appears 
to lie in their suggestiveness and comparative value rather than in 
any intrinsic conclusiveness. The selected character of the pupils 
measured, the doubtful or unknown reliability of many of the 
measures themselves, and the sphinx-like nature of correlation coef- 
ficients in general serve to make the author herself very cautious in 
her deductions. In her opinion, the data suggest that “intelligence 
becomes increasingly important as the pupil advances through the 
high school’; that ‘the most significant factor next to estimated 
intelligence in its association with scholarship appeared to be the 
quality, or composite of qualities, defined as school attitude’; that 
the qualities ‘‘most highly associated with estimated leadership 
were: physical energy, school attitude, desire to excel, will and persist- 
ence, and health’; that a combination of intelligence, school attitude, 
physical energy, and chronological age scores serves very well to 
predict school achievement; that ‘‘the tests of will-temperament, Car- 
negie IX, showed low and unreliable correlations with school achieve- 
ment and leadership’; that teacher’s estimates of intelligence lay 
‘almost equal emphasis upon intelligence as measured by the objective 
Terman Mental Test, and industry asestimated.”’ Thefinal conclusion 
tentatively advanced, is that,’’ The two types of school success, aca- 
demic achievement and school leadership, appear to differ in their 
comparative correlations with mental ability and other traits of per- 
sonality. School marks appear to be associated more closely with 
abstract intelligence and with traits of conformity as expressed in 
industry and school attitude; whereas leadership would seem to 
be correlated more highly with such traits as energy and will and 
persistence, traits indicative of vitality, dynamic activity, in contrast 
with industry as routine effort. For leadership, increments in intelli- 
gence above average mentality appear less significant than for 
academic success in the advanced grades, and less significant for 
leadership than individual differences in other traits of personality 
and character.” M. H. WIt.ina. 

University of Wisconsin. 





Intel 


T 
to th 
prob: 
meas 
A co 
natio 
From 
gene! 
the I 
older 

A 
to in 
of en 
subje 
four « 
were 
used 
and t 
test, : 

H 
if at ¢ 
Italia 
these 

ir 

Ey 
that 1 
Beta 
better 
intere 
home. 
Beta. 

Ot 
father 
and .S 
Illinoi 




































New Publications 


THE MENTAL ABILITY OF NATIONALITY GROUPS 


Intelligence and Immigration, by Clifford Kirkpatrick. Baltimore: 
The Williams and Wilkins Co., 1926. Pp. XIII + 127. 


This interesting and suggestive volume contains, as an introduction 
to the report of the author’s experiment in racial psychology, what is 
probably the best summary of the work done in connection with the 
measurement of the intelligence of immigrants up to the present time. 
A convenient table summarizes the past work, ranking the different 
nationalities according to the results of the various investigations. 
From an inspection of the table, ‘‘the conclusion seems to be that in 
general the representatives of the newer immigration, especially 
the Latins, have less intelligence than the Americans or those of the 
older immigrant stock.”’ 

Admitting that such a conclusion was tentative, the author set out 
to investigate the question further and also to study the possibility 
of environmental and linguistic handicaps upon test results. The 
subjects were school children, all of eleven years of age, residents of 
four or five small New England cities. The racial groups represented 
were Americans, Finns, Italians, and French Canadians. The tests 
used were the Army Beta, as representative of a non-language test, 
and the Illinois Examination, which consists of a verbal intelligence 
test, a reading test, and an arithmetic test. 

His conclusions contain the following: ‘‘ Americans are but slightly, 
if at all superior to the Finns in intelligence. Both are far above the 
Italians, and the French Canadians taken as a whole, rank between 
these two extremes. 

“These differences are accentuated by a linguistic handicap.” 

Evidence for the last statement consists principally of the fact 
that the immigrant groups do relatively better on the non-language 
Beta than on the Illinois examination, and that they do relatively 
better in Arithmetic than in Reading. The author has also worked an 
interesting correlation between test score and the use of English in the 
home. This averages .47 with the Illinois Examination and .32 with 
Beta. 

Other interesting correlations are those between occupation of the 
father and test results, which average .25 in the case of the Illinois, 
and .20 in the case of Beta; correlations between grade location and 
Illinois, .70, and grade location and Beta, .60. 





NE a TE 
ee ETE TOT Se O ONES “ 
7 - = = oom _— = c 
2 sag ; 
ae f a 





212 The Journal of Educational Psychology 


If we were to make a snap judgment of the conclusions of the author 
we should certainly agree with him that he has demonstrated the 
existence of a linguistic handicap (and perhaps also of a social and 
economic handicap), but we do not think that he has demonstrated so 
conclusively that over and above these environmental handicaps there 
are still native differences between the races he has examined. 

If we wish to know how much language ability contributed to the 
Illinois score, we may find the partial correlation coefficient between 
the use of English in the home and the Illinois score, with Beta constant. 
This is .38. If we turn this into a percentage equivalent (according to 
an article by P. H. Nygaard in Journal of Educational Psychology, 
Feb., 1926) we find that it is equivalent to 29 per cent. I wonder if 
we are correct in assuming that language ability contributed 29 per 
cent of the score. If we are, then such a percentage would seriously 
affect any conclusions based upon differences in scores which would 
not be greater than that. Again the partial correlation between occu- 
pation of the parent and Illinois, with Beta constant, is .16, which 
according to the same table, is equivalent to 14 per cent, another 
percentage to be taken into account in drawing any conclusions from 
differences in scores. 

It seems that all told we have enough data collected by mental 
testers to solve some of the questions concerning native and racial 
differences. Will some competent statistician and psychologist help 
us out by subjecting the existing data to a rigorous analysis? 

W. D. Commins. 
St. Louis University. 





EXTRA-CURRICULAR PRACTICES 


Extra-curricular Activities in the Junior High School, by Paul W. 
Terry. Baltimore: Warwick and York, 1926. Pp. 122. 


This monograph reports and comments on information secured by 
questionnaire from 74 junior high schools relative to extra-classroom | 


activities. The territorial distribution of the schools is not given. 
In size they are almost equally divided among small (less than 500 


pupils), medium (between 500 and 999 pupils), and large (more than [ 


1000 pupils); they average 825 pupils. 


Succinctly, the most significant findings are as follows: Provision | 
for individual differences as measured by the number and kinds of ff 











or 
he 
od 
80 
re 


he 
en 
nt. 


Y; 


yer 
sly 
ild 
ou- 
ich 
her 
om 


tal 
rial 
elp 


iby § 


om 
yen. 
500 


han | 


3100 


s of fi 


New Publications 213 


organizations in a school varies greatly from small to large schools 
and even within the same classification. Athletic, musical, and 
“English” activities constitute the majority of the organizations. 
Minor sports seem to be preferred to football. The school paper 
appears in 67 per cent of the schools, the annual in but 20 per cent. 
Pupils are rather generally encouraged “to join” by means of pins, 
monograms, letters, numerals, prizes, certificates, or school credit. 
Less than half of the schools keep records of pupil participation, but 
those that do think them highly valuable for guidance purposes. Var- 
ious types of limitations on participation are reported, betraying con- 
fused points of views. Many schools apparently still are nervous 
about the welfare of the classroom work and seek to make its proper 
accomplishment a condition to taking part in the extra-curricular 
enterprise. Both the pupil’s interests and the organization’s interests 
are being increasingly taken into account in admission procedures. 
Teachers are encouraged to take charge of the activities mainly asa 
fulfillment of duty or to secure better rating}; and only infrequently 
are they granted money or reduction in teaching load for the work. 
The chief problems at present are those relating to time provision and 
the interest and fitness of teachers. 

The study is clearly, concisely, and systematically reported. It is 
significant for its facts and for the valuable principles which it brings 
into relief through the interpretation and discussion of these facts. 
The extra-curricular program for the junior high school is still far from 
standardized or adequately evaluated, and its sponsors very much need 
exactly such critical views of its transitional stages as this little book 
presents. M. H. Wir111Na. 

University of Wisconsin. 





LATIN FOR THE SAKE OF ENGLISH 


A Study of the Quality of English in Latin, Translation, by Maxie 
Nave Woodring. New York City: Teachers College, 1925. Pp. 
VIII + 84. 


The first purpose in teaching Latin, according to high school 
teachers of the subject, is “‘to increase the ability to read, speak, and 
write English.’ Is this purpose achieved? Dr. Woodring has con- 
tributed to the answer through a study of 150 translations by four-year 
Latin pupils of a sight passage from Cicero given in the College Board 


































_ a OE Oe OR 


RR Pe eee 


| OS TE A tas I 
ee ee eink —= 
- ? — 


FRET = er oe 
= e 

‘ 

‘ “ 








214 The Journal of Educational Psychology 


examination of 1922. These translations, taken from the 50 best 
papers, 50 average papers, and the 50 poorest papers, were scored for 
the quality of their English, first by classifying each sentence as 
Acceptable English, Translation English, or No Meaning English, and, 
second, by comparing them with two translations written by two 
Latin specialists. They were further measured by the Hudelson 
English Scale and compared with compositions written by the same 
pupils in the English Examination of that year, likewise measured by 
the Hudelson Scale. Finally, they were analyzed as to choice of 
words, and as to errors in grammar, rhetoric, and spelling. 

Some of the conclusions reached are as follows: Only one-third of 
the translations were acceptable English. The translations showed 
little discriminating choice of words. There was little relationship 
between the quality of the English in the translation and that of 
the English in the compositions. Four-year pupils display a limited 
vocabulary in the translations. Errors in grammar and rhetoric 
were numerous. 

The study as a whole is interesting, ingenious, and provocative. 
It should be checked by one dealing with studied translations and 
non-examination English composition, though it is unlikely that there 
would be significant change in the findings. 

M. H. Wine. 





OTHER PUBLICATIONS RECEIVED 


A. PUBLICATIONS IN EDUCATIONAL PsyCHOLOGY 


Brown, ANDREW W.: The Unevenness of the Abilities of Dull and of 
Bright Children. New York: Teachers College, Columbia University, 
1926, pp. 112. $1.50. 

PraGET, JEAN: La Representation Du Monde Chez L’Enfant. Paris: 
Librairie Félix Alcan, 1926, pp. 424. 40 francs. 

SKINNER, CHARLES E.; Gast, Ira M.; and Skinner, Harzey C.: 
(Edited by) Readings in Educational Psychology. New York: D. 
Appleton and Company, 1926, pp. 833. 

THORNDIKE, E. L., and others. The Measurement of Intelligence. 
New York: Teachers College, Columbia University. No date. Pp. 

XXVI + 616. 











Ne 


Yo 


Glc 


Ne 
Ca 


Te 


Yc 











































New Publications 


B. PUBLICATIONS IN PsyCHOLOGY 


BEeNnTLEY, Mapison; Dunuap, Knicut; Hunter, Water §.; 
Korrxa, Kurt; Konuer, Woireanac; McDovuGati, WILLIAM; 
Prince, Morton; Watson, JoHN B.; and Woopworts, Roserrt §. 
Psychologies of 1925. Powell Lectures in Psychological Theory. Massa- 
chusetts, Worcester: Clark University, 1926, pp.412. $6.00. 

Fenton, NorMan.: Shell Shock and Its Aftermath. St. Louis, 
Missouri: The C. V. Mosby Company, 1926, pp. 173. $3.00. 

Ho.turncwortH, H. L.: Mental Growth and Decline. New York: 
D. Appleton Company, 1927, pp. 395. $3.00. 

SLAWSON, JOHN: The Delinquent Boy. Boston, Mass.: Richard G. 
Badger, 1926, pp. 477. 


C. PUBLICATIONS IN THE GENERAL EDUCATIONAL FIELD 


Co.iinGs, Ettswortu: School Supervision in Theory and Practice. 
New York: Thomas Y. Crowell Company, 1927, pp. 368. $2.75 net. 

Coox, WruuraM A.: Federal and State School Administration. New 
York: Thomas Y. Crowell Company, 1927, pp. 373. $2.75 net. 

Curor, Puiuie R. V.: Principles of Education. New York: 
Globe Book Company, 1926, pp. 137. 

GRANRUD, JOHN: The Organization and Objectives of State Teachers 
Associations. New York: Teachers College, Columbia University, 
1926, pp. 71. $1.50. 

Hart, Josepu K.: Light from the North. The Danish Folk High- 
schools: Their Meanings for America. New York: Henry Holt and 
Company, 1926, pp. 159. $1.50. 

JONES, WALTER B.: Job Analysis and Curriculum Construction in 
the Metal Trades Industry. New York: Teachers College, Columbia 
University, 1926, pp. 104. $1.50. 

Lowtu, Frank J.: Everyday Problems of the Country Teacher. 
New York: The Macmillan Company, 1926, pp. 563. 

Maverick, Lewis A.: The Vocational Guidance of College Students. 
Cambridge, Mass.: Harvard University Press, 1926, pp. 251. $2.50. 

OusEeN, Hans C.: The Work of Boards of Education. New York: 
Teachers College, Columbia University, 1926, pp. 170. $1.50. 

Stoman, Laura G.: Some Primary Methods. New York: The 
Macmillan Company, 1927, pp. 293. 

STEELE, Rosert M.: A Study of Teacher Trainingin Vermont. New 
York: Teachers College, Columbia University, 1926, pp. 111. $1.50. 














216 The Journal of Educational Psychology 


Srrou, M. Marcaret: Literature for Grades VII, VIII, and IX, 
New York: Teachers College, Columbia University, 1926, pp. 110. 
$1.50. 

Tuomas, FRANK W.: Principlesand Technique of Teaching. Boston, 
Mass: Houghton, Mifflin Company, 1927, pp. 410. $2.00. 

TURNBULL, G. H.: The Educational Theory of J. G. Fitchte. The 
University Press of Liverpool; Hodder and Stoughton, London, 
1926, pp. 283. 12s, 6d net. 


D. New Scuoout TExTBooKS 


Dickson, MARGUERITE S.: American History for Grammar Schools. 
New York: The Macmillan Company, 1927, pp. 313. 

HaGEeMAN, SaDyYE M.: Children of Grizzly. How They Learned the 
Secrets of Health. Yonkers, New York: World Book Company, 1927, 
pp. 176. $1.00. 

McMorry, FranK M., anp Parkins, A. E.: World Geography. 


Book II. The Old World. New York: The Macmillan Company, 
1927, pp. 323. 


E. OTHER PUBLICATIONS 


AMERICAN LiprarRy AssociaTION: Proposed Classification and Com- 
pensation Plans for Library Positions. Washington D.C.: Published by 
the Bureau of Public Personnel Administration, 1927, pp.208. $2.15. 

DeExTER, Rosert C.: Social Adjustment. New York: Alfred A. 
Knopf, 1927, pp. 424. 

Snow, A. J.: Matter and Gravity in Newton’s Physical Philosophy. 
A Study in the Natural Philosophy of Newton’s Time. London: Oxford 
University Press, 1926, pp. 256. (American Branch, New York.) 
$2.50. 

Truitt, Raupu P.: Team-work in the Prevention of Crime. New 


York: The Joint Committee on Methods of Preventing Delinquency, 
November, 1926, pp. 20. 


F. New STANDARDIZED TESTS 


ScHORLING, RALEIGH; CLARK, JOHN R.; and LINDELL, Seta A.: 
Instructional Tests in Algebra, with Goals for Pupils of Varying Abilities. 
Yonkers, New York: World Book Company, 1927, pp. 72. 








